













People Talking 
When They Can’ 
Hear Their Own 
Voices! 


Pal 


The situation of spontaneous speech usually includes two ubiquitous 
tions: people hear themselves as they talk and they see their inter! 
Primarily, this chapter concerns the role of the auditory feedback 
control of spontaneous speech and interaction, and in determining sul 
experience. Very secondarily, it also deals with the role of the visual | 
from the audience. 


LITERATURE REVIEW 


The significance of feedback in the regulation of behavior has been 
nized implicitly or explicitly for a long time. On the physiological 
example, the early work of Bell (1826), Bernard (1858), and Sherri 
Mott (1895) demonstrated the critical role of proprioceptive feedback 
regulation of voluntary motor activity. The clinical condition of sabes 


‘Adapted from G. F. Mahl (1972), People Talking When They Can't Hear Their 
A. Siegman and B. Pope (1972, Eds.), Studies in Dyadic Communication. New York: 
Press. (chapter 10, pp. 211-264). Adapted with permission of the editors and pul 
slight editorial and stylistic revisions. The author presented a summary of the basic 
reported in this chapter at the 16th International Congress of Psychology, Bonn, 1960 
1961b). 

‘The author's interest in feedback is an old one. As mentioned in chapter 7 (fn. 2), 
publication concerned a technique for providing visible signs of muscular activity (Si 
Mahl, 1941). And during the carly 1950's he explored the effect of playing back to patients 
recordings of their psychotherapy interviews. 








27mR 


LITERATURE REVIEW 329 


did the same. Cannon’s elucidation of homeostatic mechanisms (1932) often 
dealt with complex feedback systems. 

Viewed retrospectively, it is clear that Freud’s psychological models also 
included feedback systems although they were not articulated as such. This 
is most apparent in the regulatory role Freud ascribed to se/f-observation. To 
mention a few major instances, Freud explicitly or implicitly dealt with this 
process in his hypothesis of Cs as a sense organ that monitors the workings of 
the psychic apparatus (1900/1953a) pp. 615ff.); in his concept of dream 
censorship (1900/1953a); in his hypotheses of the self-observing, self-eval- 
uating, and self-critical functions of the Super-ego (1914/1957a, 1923/1961); 
and in his revived and revised theory of anxiety and defense (1926/1959). 
One could say with some validity that one thing Freud did as he developed 
his theory was to articulate in more detail feedback processes involved in 
self-observation and self-regulation and to attribute greater significance to 
them. The relevance of psychoanalytic theory to feedback concepts and to 
this chapter is clearly implied in Freud's brief statement (1901/1960, p. 101) 
that “we may hear the stifled voice of the author's self-criticism” in the 
speech disturbances of everyday life and in contorted writing. In the Discus- 
sion, we consider further the relation of this chapter and psychoanalytic 
theory. 

Wiener (1948, 1950) pointed out the early application of feedback princi- 
ples in the use of the governor and the steering machine for the control of the 
speed of the steam engine and the position of the rudder. He cites Maxwell's 
paper (1868) on governors as the “first significant paper on feedback mecha- 
nisms” (1948, p. 19). One does not have to agree with this judgment to 
realize that in both the practical use and the theoretical understanding of 
machine behavior, “feedback” has a long history. 

In spite of this history, as well as that indicated for physiology and psy- 
chology, it was Norbert Wiener who articulated the feedback concept and 
with his colleagues (Rosenblueth, Wiener, & Bigelow, 1943) pioneered in 
showing the relevance of feedback theory to the behavior of machines and of 
man. 

With the appearance of Cybernetics (Wiener, 1948) and Lee’s related dis- 
covery of the effects of delayed auditory feedback (1950a, 1950b, 1951), a 
previously latent, and only sporadically manifested, interest in the role of 
auditory feedback in speech regulation gained momentum. The speech defi- 
cit of the deaf had always provided dramatic evidence that auditory feedback 
was critical for speech development and regulation. But it is largely due to 
the research of the past 30 years that a more exact picture of the feedback 
regulation of speech has begun to emerge. We review this research by sum- 
marizing the findings of studies in which (a) the temporal aspect of auditory 
feedback has been altered, (b) other aspects of auditory feedback have been 

changed, (c) auditory feedback has been abolished or masked. 

































































330 © 21. PEOPLE CAN'T HEAR THEIR OWN VOICES 


Effects of Changing the Timing of Auditory Feedback 
Delayed Auditory Feedback (DAF)? 


. . « Black (1954), himself a pioneer investigator in this area, observed 
that Fletcher’s notes contained a brief reference in 1918 to this problem as it 
might arise with increasing distance of side-tone transmission over telephone 
circuits. But Lee’s paper (1950a), which was actually a “letter to the editor,” 
is generally regarded as the first report on the effects of DAF. Nine years 
later Chase, Sutton, and First (1959) listed 101 references in their DAF 
bibliography; Smith’s integrative reviews (1962; Smith, Ansell, & Smith, 
1963) contain additional references. There is obviously a voluminous DAF 
literature. The reader is urged to consult Lee’s original papers and to use the 
bibliography of Chase et al. and Smith's reviews as guides to this literature, 
for only the major findings are cited here. 

Lee (1950a, 1950b) aiscovered that it a person’s speech was returned to 
his ears through earphones at an amplified level and delayed by one-eighth 
or one-quarter second, the individual stuttered, spoke slower, increased the 
pitch or volume of his speech, and sometimes stopped speaking. A delay of 
only one-fifteenth second had little or no effect, which indicated that the 
time factor was critical. Marked speech disruption only occurs if the delayed 
feedback rep/aces normal feedback. The use of earphones and amplification 
of the signal produces this condition, for the former decreases normal air- 
borne feedback and the latter masks bone-conduction feedback as well as 
any airborne feedback still effective through the earphones. 

Lee also observed other effects. Some subjects developed “a quavering 
slow speech” reminiscent of cerebral palsy. Emotional arousal was indicated 
by reddening of the face. Extension of the DAF condition for more than 2 
minutes caused physical fatigue. Individual differences in all the DAF ef- 
fects were apparent. 

Subsequent research has confirmed and refined Lee’s original observa- 
tions. The work of Fairbanks (1955) and Fairbanks and Guttman (1958) is 
particularly relevant. They verified the critical significance of the time fac- 
tor, finding that the general peak disturbance of speech during oral reading 
occurred with a delay of .2 seconds. They also demonstrated, hqwever, that 
different types of articulatory disturbances were maximal with slightly differ- 
ent delay intervals. This finding emphasized she intricacy of the feedback control 
of speech. 

Chase, Sutton, First, and Zubin (1961) compared the effect of DAF on 
the impromptu speech of children 4-6 years old and 7-9 years old. DAF 
caused the children of both age groups to repeat more words and syllables, to 
prolong more syllables, and to speak slower. The last two effects, however, 





2Also sometimes called “delayed sidetone.” 





LITERATURE REVIEW 331. 


were significantly greater in the older than in the younger children. Corre- 
lated with this age difference, the older children indicated in an Inquiry 
greater dependence on and sensitivity to auditory feedback than did the 
younger children, Nearly all the older children recognized their own voices 
through the earphones in both control and DAF conditions; only half of the 
younger children did in both conditions! The children were asked how the 
voice they heard in the DAF condition differed from that of the control 
condition. Half of the older children showed some awareness of the delay 
factor, but not a single one of the younger children did! 


Accelerated Auditory Feedback 


It takes time for normal auditory feedback to reach one’s cochlea. Peters 
(1954) cites Stromstra’s estimate that normal bone-conducted feedback takes 
at least .0003 seconds and normal air-conducted feedback at least .001 sec- 
onds. Peters then investigated the effect on reading rate of decreasing these 
normal delay intervals to .00015 seconds by electronic means. This acceler- 
ated feedback increased the reading rate, whereas delayed feedback de- 
creases it, 


Effects of Changing Other Dimensions of Auditory 
Feedback 


If the timing of the speaker's auditory feedback is left untouched, but feed- 
back is altered in other ways, his speech will still be affected, The speaker 
will vary the intensity of his voice inversely with the intensity of his feedback 
(Black, 1950a; Lighttoot & Morrill, 1949). The intelligibility of heard speech 
varies directly with the intensity of it and also depends on the high-frequen- 
cy components in it. A speaker will increase the intelligibility of his speech if 
the intensity of his feedback is lowered (Black, 1950a; Black, Tolhurst, & 
Morrill, 1953) or if the high-frequency component is filtered o © f his 
feedback (Peters, 1955). 

In the preceding studies, the feedback was delivered to the speakers’ ears 
through earphones and was altered electronically. Black (1950a, 1950b) also 
manipulated feedback characteristics by varying the acoustical environment 
in which subjects spoke. He found that people spoke louder in “‘dead” than 
in “live” rooms. Moreover, their voices became progressively louder in the 
“dead” rooms but softer in the “live” rooms. Wiener (1950, p. 170) men- 
tions a related phenomenon: when people use a “dead” telephone system, 
in which their own speech is not fed back to their ears through the receiver, 
they start shouting into the telephone. 

In short, speech appears to change in ways that compensate for experi- 
mentally produced variations in auditory feedback. Again, the feedback reg- 
ulatory system emerges as a very intricate one. 
















































































332 21. PEOPLE CAN’T HEAR THEIR OWN VOICES 


Effects of Abolishing or Masking Auditory Feedback 
Speech Changes in the Deafened 


Congenital or early deafness seriously impairs the development of speech. 
Of special interest here is the fact that the individual who suffers a hearing 
loss after he has learned to talk manifests several characteristic changes in his 
speech: a deterioration of precision of enunciation, a flattening of the intona- 
tion patterns, and a loss of control of loudness of his voice (Carhart, 1960). 
The latter varies with the etiology of the hearing loss. In conduction-deaf- 
ness, where there is damage primarily to the middle ear and not to the inner 
ear, the individual speaks very softly. Because his bone-conducted auditory 
feedback has not been impaired while air-conducted input has, his own voice 
sounds louder to him that the voices of others. He compensates for this 
differential ‘tear experience” by lowering the intensity of his own voice. In 
perception-deafness (nerve deafness), where there is damage to the inner ear 
or the auditory neural pathways, the individual speaks very loudly. He com- 
pensates for the total loss of auditory feedback. Thus the painful experiment 
of nature provided in the deafened individual demonstrates that the mainte- 
nance of acquired speech patterns depends on the presence of auditory 
feedback. 


Experimental, Temporary Hearing Loss 


Prolonged exposure to loud noises and tones results in temporary hearing 
loss of the perception type (Davis, Morgan, Hawkins, Galambos, & Smith, 
1950, e.g.,). Black (1951) studied the effects of such temporary deafness on 
speech intensity by comparing the voice intensity fo//owing exposure to loud 
noise for 2 hours with that preceding such treatment. He confirmed the 
production of a temporary hearing loss by appropriate auditory measurements. 
His subjects spoke louder immediately following the exposure to noise and 
their voices decreased in intensity as their auditory thresholds recovered. 

Masking of Auditory Feedback 

This may be accomplished by means of loud low-frequency tones and 
especially broad-band noises, which include low-frequency components; 
that is, by the same techniques that mask perception of the speech of other 
people (Miller, 1951). When such tones and noises are administered through 
earphones to doth ears, a speaker’s auditory feedback may be partially or 
completely masked. The actual extent of the masking varies with the inten- 
sity and frequency composition of the noise, assuming that it is continuous. 
Such masking interferes with both bone-conducted and airborne feedback. 
The work of Galambos and Davis (1943, 1944), as well as that of Lowy 
(1945), strongly suggests that this masking is due to neural processes in the 
cochlea itself. 








LITERATURE REVIEW =. 333 


One well-known effect of masking feedback is an increase in voice inten- 
sity. This is the Lombard effect (Lombard, 1910), which forms the basis of 
one procedure used to identify nonorganic deafness. Hanley and Steer 
(1949) found that as a binaurally administered, “airplane type’? masking 
noise was systematically increased in intensity, speakers increased the loud- 
ness of their speech, lengthened syllables, and decreased their word rate. 
Winchester and Gibbons (1958) obtained similar results for word rate but 
their data failed to reach statistical significance. Their study, however, was 
not an exact replication of Hanley and Steer’s. 

Wood (1950) observed subjectively the speech changes during oral reading 
by 20 college students when a “high-level white noise” completely masked 
their auditory feedback. He judged that every subject increased his voice 
intensity and decreased his speech rate with this experimental treatment. 

Wood believed that the masking condition also caused changes in the 
pitch, resonance, and intonation patterns of the voice. We refer to his de- 
scription of these effects later in the Discussion. 

Four different investigators, apparently quite independently, discovered 
that the masking of auditory feedback improves abnormal speech. Kern 
(1932), Shane (1955, research done in 1946), and Cherry and co-workers 
(Cherry & Sayers, 1956; Cherry, Sayers, & Marland, 1955) found that bin- 
aural masking produced a striking decrease in stuttering. Cherry and Sayers 
demonstrated, furthermore, that the effect of a low-frequency masking noise 
was much greater than the effect of a high-frequency noise (cut-off point was 
500 c.p.s.). This finding is important for it indicates that feedback masking 
is the crucial factor and not the mere use of noise. Only with the low- 
frequency noise is complete masking achieved. The masking techniques of 
Kern and Shane utilized low-frequency sounds. Birch (1956); Birch and Lee, 
(1955) masked the speech of patients with expressive aphasia with a low- 
frequency tone of 256 c.p.s. and found that, “In approximately 75 percent of 
the patients tested, [verbal] performance was decisively improved” (Birch, 
1956, p. 3851). 


Conclusions From Literature Review 


The speech deterioration of the deafened provides clear evidence that con- 
tinual auditory feedback plays a significant role in the preservation of devel- 
oped speech patterns. The effects on speech of experimentally produced 
temporary hearing loss and of binaural masking underscore the dependence 
of speech patterns on auditory feedback. Habitual speech changes within 
seconds or minutes when feedback is masked. An increase in loudness of 
voice is the most well-documented change. The literature is scant as far as 
other changes are concerned, but what there is suggests that feedback mask- 
ing effects a broad spectrum of the dimensions of speech: rate of speech, 












































334 = 21. PEOPLE CAN’T HEAR THEIR OWN VOICES 


pitch, vocal quality, and intonation, and those factors involved in the abnor- 
mal speech of stuttering and expressive aphasia. As meager as the data are in 
these respects, they too suggest that the feedback control of speech is an 
intricate matter. The DAF literature and the manipulation of nontemporal 
aspects of feedback pointed in the same direction. Jt appears that multiple 
attributes of speech are delicately regulated by corresponding, multiple channels of 
auditory feedback. And these feedback sub-systems are all integrated in the regulation 
of the entire speech pattern. The literature supports the increasingly complex 
feedback models proposed by Lee (1950b), Fairbanks (1955), and Smith 
(1962). 

The feedback literature also poses a challenge. A sketchy outline of the 
aural monitor has been emerging during the past 15 years. Most of the 
information on which the sketch is based has come from studies of delayed 
auditory feedback, but the temporal is only one aspect of feedback. A great 
deal of work needs to be done on other aspects of auditory feedback to 
verify, correct, and extend what is now known or believed. This will not only 
result in a more detailed picture of the aural monitor and how it works; there 
is a strong possibility that future work on feedback in speech will be of more 
general significance because feedback control is such a prevalent behavioral 
phenomenon. 


PRESENT STUDY 


Purpose 


Originally, this experiment was designed to determine the role of auditory 
feedback from one’s own voice and visual input from interlocutors in the 
occurrence of common disturbances of spontaneous speech [chapter 9]. . . 
As the experiment was conducted for the original purpose, it became appar- 
ent that the masking of auditory feedback had many striking, grossly observ- 
able effects, many of which have not been reported in the literature or at 
least are not generally known. The purpose of this chapter is to present a 
survey of those effects and to consider their potential theoretical signifi- 
cance. The chapter is exploratory. It does not test any general hypotheses; 
that doesn’t seem fruitful at present. It does, however, lead to some hypoth- 
eses for further research. 

The paper is related to the research reviewed above in the following ways: 

a. The auditory feedback manipulation is complete binaural masking. 

b. The effects on normal, spontaneous, and extended speech of a speaker 
engaged in a dialogue are studied. When this research was conducted, only 
Wood (1950) had studied the effect of complete binaural masking on normal 
speech, as far as we know, and his subjects read in a monologue for a 





PRESENT STUDY 335 


maximum of 3% minutes. With only rare exceptions, the previous feedback 
studies have utilized the monologue, oral reading of brief phrases, or 
sentences. 

c. The effects of binaural masking on she more general psychological state of 
the speaker are studied, as well as the effects on speech. 


Method 


Overall Plan 


College students were interviewed under four different conditions: (a) 
when they sat in the usual face-to-face situation (F); (b) when they faced the 
interviewer but could not hear themselves because of the administration of a 
masking noise through earphones they were wearing (F-N); (c) when they 
were not facing the interviewer and thus could not see him because he sat 
behind them (B); and (d) when they could neither see the interviewer nor 
hear themselves talk (B-N). Exploratory work demonstrated the feasibility of 
these interview conditions. Students and associates and the writer himself 
spoke under the projected conditions without physical or psychological dis- 
comfort after an initial period of adaptation. The writer found he could 
interview under the projected conditions and gained familiarity with doing 
so. 

To distinguish between initial effects due to the pure novelty of the 
experimental conditions and the effects due to the essential nature of the 
conditions, each subject was interviewed three times. All interviews were 
tape-recorded and transcribed. The clinical and objective study of the tape 
recordings and interview transcripts, as well as recorded reports by the sub- 
jects during an inquiry at the close of each interview, provide the basic data 
of the study. The subjects took the Minnesota Multiphasic Personality In- 
ventory (MMPI) and were given the Wechsler Adult Intelligence Scale 
(WAIS) between interviews. Home interviews with the mothers of each 
subject occurred after he had completed his interviews; they are not relevant 
to the present chapter but are mentioned for the sake of completeness.> In 
the following paragraphs, the Method is described in more detail. 


Subjects 


A psychology professor at a then small state university, primarily concerned 
with training public school teachers, recruited the subjects from his second 


31 am grateful to the following people for their contributions to this project: the subjects, 
Drs. Richard Waite and William Trinkaus, Naomi Miller, Irene Bickenbach, Judith TTillson, 
Genoveva Palmieri, Gene Schulze, Sue Cohen. Susan Ehrlich, and Ruth Johnson. 









































336 21. PEOPLE CAN’T HEAR THEIR OWN VOICES 


and third year classes. He stated that a psychologist at Yale was seeking both 
male and females subjects, whose parents lived within an hour's drive of the 
school, for an experiment involving several interviews, psychological tests 
including the WAIS, and interviews with their mothers. He stated that the 
subjects would receive, in return, the experience, a small fee, and the results 
of the WAIS, which he himself would convey to them. 

Seventeen students, eight male and nine, female, volunteered. Fourteen 
intended to become schoolteachers. Their ages ranged from 20 to 27, four- 
teen being 20 to 22 years old. The group was above average in intelligence 
and of higher verbal than performance ability, as the following summary of 
the WAIS scores shows. 


Scores M S.D. 
Vocabulary Scale 13 1.4 
Verbal 1Q 117 6.5 
Performance IQ 113 8.2 
Full Scale IQ 116 4.4 


No precise social class indexing was carried out, but nine of the subjects 
seemed to be of lower middle-class background, seven from upper middle- 
class households and one from a lower-upper class household. The ethnic 
home background of the sample was quite heterogeneous, as is indicated by 
the following tabulation of the nationality or ethnic membership of the sub- 
jects’ parents: 


Ethnic Influences 
in Home of Origin 


= 





Anglo-Saxon (“American”) 
Italian 

Jewish 

Irish-American 

Greek 

Czech 

Hungarian 

Irish-American and Jewish 
Polish and Anglo-Saxon 


oeeeemen a 


a 


All subjects were white and had been born and spent their entire lives in 
the United States. English was the native language for all subjects. Some 





PRESENT STUDY 337 


had limited familiarity with the original language of their parents where this 
was other than English. No subject gave any gross signs of speech pathology 
or “foreign accent.” 


The Interviews 


Each subject went through the following schedule: Interview I, Inquiry, 
followed immediately by the self-administered, individual form of the 
MMPI; Interview II, Inquiry, followed immediately by WAIS; Interview 
III, Inquiry. The home interviews with the mothers occurred within 7 weeks 
after Interview III, except in one case when it took place on the day of 
Interview II. Interviews I, II, III occurred on different days, being dis- 
tributed over spans of 4 to 16 days, with a modal span of 8 days for 7 subjects. 
They took place in a room especially designed for high-fidelity sound record- 
ing of interviews [Mahl, Dollard, & Redlich, 1954]. 

The personal, psychological interview, modeled after the initial psychi- 
atric interview, was used in the belief that it would afford equally for inter- 
viewer and subject the most meaningful and useful situation for sustained 
repeated interaction. 

The following outline sketches the essentials of the three interviews: 
further following comments describe certain features more fully. 


Interview I, (familiarization) about 50 minutes long. 
Introductory phase: 5 to 10 minutes 
Interview proper: 30 minutes 
Continuous topics: interests, family, academic choice 
Conditions, fixed sequence: F, F-N, B-N. 
Inquiry: 10 minutes 


Interview II, about 50 minutes long. 
Discontinuous topics: fixed sequence 
1. Current activities in school, work, hobbies. 
2. Most significant person in current life: nature of the person and 
relationship with subject. 
3. Most significant person in past life: 
nature of the person and relationship with subject. 
4. MMPI experience 
Conditions: sequence varied by subject 


F 1U minutes 
F-N 10 minutes 
B 10 minutes 
B-N 10 minutes 


(one topic per condition) 
Inquiry: 10 minutes 























338 21. PEOPLE CAN’T HEAR THEIR OWN VOICES 


Interview III, about 50 minutes long. 
Topic content and schedule ordinarily the same as in Interview II. 
Conditions: same as in Interview II, except that each subject 
starts with what was his second condition in Interview II. 


Interview I: Familiarization This interview opened with the first meeting 
between the interviewer and the subject. They sat face-to-face and spent 
approximately the first 5 minutes discussing the nature of the scheduled 
procedures, time arrangements, and payment of fees. The essence of what 
the interviewer told each subject about the nature and purpose of the study 
was as follows: 


In this experiment, I’m interested in people’s reactions when they are talking 
under different conditions, First, when you're talking with me like this, face- 
to-face, under the usual conditions. Then, when you can’t hear yourself talk 
because of a noise playing through earphones in your ears. The noise will not 
be painful, but it is loud enough and has such characteristics that you won't be 
able to hear yourself talk. The next condition is when you’ re not looking at me. 
In that case, I just turn your chair around and I sit behind you. The final 
condition is when we are talking and you can’t hear yourself because of the 
noise while you are sitting with your back to me. 

I am doing this study now with well-functioning people like yourself and 
the other students, Later | may compare the reactions of all you people with 
hospitalized psychiatric patients. 

Another thing I’m interested in is how different kinds of people react to 
these situations. We are all different and we all have different reactions to 
things. To help define the difference between yourself and your classmates, 
I'll also ask you to take the Wechsler intelligence test and a personality test, 
and I'd like to have an interview done with your mother. In all of these 
interviews, tests, etc., I’m not interested in how smart you are or whether you 
have such and such a complex. I’m only interested in seeing how you're 
different from the others and if your reactions to these conditions are related to 
these differences between yourself and the others. 

‘Today you can get used to the conditions while we talk and I can get to 
know a little bit about you. We’ll talk sitting like this, then with the noise, and 
then with the noise while you're facing away from me. In the next two inter- 
views, we'll use all four of the conditions and they'll be in a different order 
each time. We'll be more systematic then. 


The instructions were designed to orient the subjects to their general reactions 
when talking under the different conditions rather than to the specific details 
of the speech process itself. In his responses to the initial questions asked by 
the subjects, the interviewer attempted to reinforce this general emphasis 
and it was further reinforced by the general nature of the Inquiry questions. 

Following the introductory phase, the interview proper began with this 
remark by the Interviewer: “I'd like to begin by finding out something about 





PRESENT STUDY 339 


you—how old you are, what your interests are, how you came to go to 
teacher's college, about your family, and so on.” This comment defines the 
areas covered in Interview I, proper, which lasted for about 30 minutes. The 
goal of this interview section was to familiarize the subject with the various 
conditions under which he would be speaking, with the interviewer in this 
particular role and he with the subject and with the general situation of a 
personal interview. 

The Inquiry always consisted of an open-ended exploration of the follow- 
ing questions: 


1, How did you feel in the different conditions? What was your inner 
experience like when you spoke with the noise on? When facing away from 
me? And when these two were combined? 

2. What did the noise remind you of? 

3. How would you rank the conditions for comfort or discomfort? For ease 
of talking? 

4, When we were talking in the various conditions you might have had 
some peripheral, fleeting thoughts you didn’t have a chance to mention. Do 
you recall any? Could you tell me about them? 

5. Is there anything else you think of that would shed light on how you 
experienced the different conditions? 


Interview II. This included all four conditions of talking and a somewhat 
restricted schedule of four general topics, one topic being considered in each 
condition. The topic sequence was the same for all subjects, but the se- 
quence of the conditions was varied systematically from subject to subject. 
The topics mentioned in the aforementioned outline were explored in an 
open-ended manner. 

The sequence of conditions for !nterview II was determined in the follow- 
ing manner. A basic sequence of F, F-N, B, and B-N, in that order, was 
designated. The subjects were listed according to the order in which they 
were scheduled for their first interview. The first subject started the basic 
sequence in F condition, the second subject in F-N, the third in B, the 
fourth in B-N, the fifth in F, etc. This system provided four condition 
sequences, with four subjects in three sequences and five in the other. 





Sequence 
N pewerrweere url eR LY 
Subjects F F-N B B-N 
5 1 2 3 4 
4 4 1 2 3 
4 3 4 1 2 
4 2 3 4 1 


























340 21, PEOPLE CAN'T. HEAR THEIR OWN VOICES 


Interview III. \t also included all four conditions, Now each subject 
started one step over in the sequence he followed in Interview II. Again, 
four subjects followed three of the sequences and five the other sequence. 
The topic schedule was more variable in this interview. Because the mainte- 
nance of “natural” and spontaneous interchange was the principal goal, the 
interviewer played it by ear. Although he basically attempted to renew dis- 
cussion of the topics of Interview II in the same sequence, he did deviate 
from this approach whenever he felt it resulted in constrained interaction or 
seemed forced and artificial. In such cases, he pursued something from the 
Preceding portions of this or the first two interviews, which the subject 
seemed interested in discussing further. 


The Masking Noise 


This consisted of frequencies up to 500 c.p.s. in equal intensities. This 
Particular masking noise was chosen for several reasons. It is very effective in 
masking all auditory feedback, both bone and air conducted. Cherry and 
Sayers (1956) had demonstrated that the use of this masking noise produced 
a striking reduction in stuttering. And such a noise is somewhat less noxious 
than one containing higher frequencies. 

The noise, produced electronically and permanently recorded on tape, 
was administered into the subjects’ ears through padded carphones simply 
by playing the tape on a recorder. The author operated the recorder by 
means of a foot switch. In the noise conditions, the masking noise was 
administered at all times except when the interviewer spoke. The subject 
could hear the interviewer easily when the noise was stopped even though 
he was wearing earphones. ‘The interviewer, of course, could always hear his 
own speech and that of the subject. 

The subject first donned the earphones after the introductory phase of 
Interview I. Thereafter, he wore them in all conditions except during the 
inquiries. The noise playback volume was variable, being set and main- 
tained at that level reported by the subject to produce complete masking of 
his voice. The average volume was approximately 93 db above the reference 
level of .0002 dynes/cm2. The important thing is that, generally speaking, 
subjects had no awareness whatever of their voices with this procedure. 
There was a total of six noise conditions for each subject in the three 
interviews, 


Study of the Tapes and Transcripts 


Externally Observable Effects of Interview Conditions. The author observed 
certain effects of the masking procedure during the exploratory work, during 
the actual interviews of this study, and in listening to all the interview tapes. 
From these observations, he established a set of categories used in a careful 
restudy of the recordings and transcripts. The categories, listed in Table 


PRESENT STUDY 341 


21.1, are of two general classes: one class refers to Speech attributes per se, such 
as loudness, pitch, etc.; the other refers to more general ‘psychological changes 
inferred from or manifested in verbal behavior. Nearly all the raw data of this 
study pertaining to externally observable effects of the masking noise are 
Judgments by the author. In some cases, supplementary, more objective data 
were obtained by procedures that are more appropriately described when 
these additional data are presented in Results. 


Subjective Experiences of Subjects. The open-ended inquiries were summa- 
tized and then coded with a set of categories, some of which generally 
convey the things frequently reported by the subjects and others of which 
provide information about the subjects’ experience of certain of the phe- 
nomena that could be observed externally. 


Results 


The externally observable effects of the masking noise are described first. In 
the course of doing so, the subjective experiences especially relevant to the 
observable changes are cited. The final section of Results consider the sub- 
jective experiences as a whole. 


Externally Observable Effects of Masking Noise 


The masking noise condition produced the effects summarized in Table 
21.1. As far as could be determined by the gross observational method 
employed, these effects were the same in the F-N and B-N conditions. The 
following comments elaborate and illustrate the data of Table 21.1. 


Linguistic Changes. Loudness. All subjects spoke louder with the masking 
noise. This change was usually sustained throughout a given condition, but 
the level of loudness was not constant. Occasionally, the voice would be- 
come dramatically loud, and a few subjects momentarily spoke with subaudi- 
ble intensity. Voluntary control of voice volume was apparently minimal in 
the masking conditions, for subjects persisted in speaking very loudly even 
though the interviewer told them they were doing so and that he would be 
able to hear them clearly if they lowered their voices.4 

Intonation. The subjects characteristically showed some degree of flatten- 
ing of the intonation pattern of the English sentence. In extreme instances, 
sentences showed very little variation in pitch or stress. 

Voice style. This term refers to a variety of changes. The specification of 
the exact details of the change is a task for the trained phonologist and 


“{He did this to reduce the possibility of excessive fatigue of the subjects.] 


342 21. PEOPLE CAN’T HEAR THEIR OWN VOICES 


TABLE 21.1 
Observable Linguistic and Behavioral Changes During Masking of 
Auditory Feedback 


EE 





Average N Noise 
Conditions per 
Subject in Which 
N of the 17 the Change Was 
Nature of Subjects Showing Judged to Occur. 
Category Change the Change (Highest possible 6) 
A. Linguistic 
Loudness increase vi 5.8 
Intonation flattens 7 5.5 
Voice style changes 17 49 
Prolongation increase 7 4.6 
Pitch changes 15 45 
(higher 9) 
(lower 6) 
Vocal Noises increase 15 . 
Slurring increase 12 1S 
Rate changes 12 
(decrease 2 
(increase 5) 
Phrasing more distinct 10 1.8 
B. Behavioral-Psychological 
Affect expression increase 4 24 
Associative response freer 4 24 
Cognitive confusion increase il 1.2 
“Thinking aloud” increase 5 5 


linguist; all the writer can do is present here for each subject, terms and 
images that occurred to him as he listened to the tapes. The following 
summary itemizes changes judged to have occurred in each subject with the 
masking noise and the number of noise conditions in which the change 
occurred. If this number is less than four, the interview will also be given. 
Generally, the changes noted were characteristic of most or the entire dura- 
tion of the condition. 


Female Subjects 





Subject 1. Voice loses whispering quality (6 conditions). 

Subject 2. Loss of subdued, voiceless quality (5 conditions) and, in addi- 
tion, voice is piercing and clear (2 of these conditions). 

Subject 3. Voice is more nasal (5 conditions). 

Subject 4. Voice is more nasal (4 conditions). 








PRESENT STUDY 343 


Subject 5. Voice is more nasal, and loses its whispery, wistful, soulful tone 
(4 conditions). Also, speech sounds less cultured (1 condition of Interview 
Il). 

Subject 6. Voice is more nasal and sounds less cultured, causing this 
listener to think of speech of Molly Goldberg and Jack Benny’s telephone 
operator (4 conditions). 

Subject 7. Voice loses whispering quality (1 condition, Interview I) and 
sounds more nasal (1 condition, Interview III). 

Subject 8. Sounds “voiceless” and boyish at times (1 condition, Interview 
Ill). 

Subject 9. Sounds “voiceless” and hollow at times (1 condition, Interview 
Il). 


Male Subjects 





Subject 10, Denasalization occurs causing the subject to sound less 
cultured; like he has a “code in the head” and to remind the listener of 
the stereotype of the “punch drunk” fighter (6 conditions). 

Subject 11. Voice sounds less hoarse or raspy, and speech becomes tele- 
graphic and uncultivated (6 conditions). 

Subject 12. Voice sounds less hoarse and “‘growly”; sounds “mouthier” 
and speech is less cultured (6 conditions). 

Subject 13. Voice is less “picky”; sounds harsher and more aggressive and 
masculine; dialect is less cultured (6 conditions). 

Subject 14. Voice quavers (5 conditions). 

Subject 15. Sounds more resonant (5 conditions) and “tougher” (2 condi- 
tions—one each in Interview I and II). 

Subject 16. Increased nasality (3 conditions: 2 in Interview I, and 1 in 
Interview II). 

Subject 17. Sounds less “froggy” and less strained (2 conditions, Inter- 
view I). 


A shift toward lower social status dialect, that is, in phonological features, was 
the principal determinant of the impression that six subjects sounded “‘less 
cultivated” when speaking in the masking conditions. Thus, Subject 13, 
American born of Italian immigrant parents, characteristically said the sound 
/0/ in “‘shink, through, etc.” more like “sink, frough”’ in the masking condi- 
tion, and the voiced sound /8/ in “shat, shis” more like “dat, dis.”” He also 
showed shifts toward the lower social status phonetic position of the vowel 
/6/ in words like caught, ta/k, thought, and saw. 





344 = 21. PEOPLE CAN’T HEAR THEIR OWN VOICES 


TABLE 21.2 
Frequency (Percentage) of “th” Variants in Speech of Subject 13 
in the Four Conditions 











Facing: Facing Away- Facing- Facing Away- 
Variants No Noise No Noise Noise Noise 
0,6 (chink, shat) 86.5 75.9 74.5 68.8 
t, d (tink, dat) 13.5 24.1 25.5 31.2 

100.0 100.0 100.0 700.0 
N Occurrences 333 261 S41 362 





The author Aeard such changes in the speech of Subject 13 as the masking 
voice conditions were introduced. In this particular instance an analysis by a 
linguist, Professor William Labov. . . who has studied intensively social 
class dialect on the Eastern seaboard (Labov, 1966), provides invaluable 
data. Labov carefully analyzed the tape recordings of this subject's speech 
during the four experimental conditions in all three interviews. Table 21.2 
summarizes the frequency with which Labovy observed the subject using the 
standard variants, /@/ and /8/ (as in sink and shat), and the substandard 
variants, /t/ and /d/ (as in sink, and dat), in the four conditions. Table 21.2 
shows that both the masking noise and the interruption of visual contact with 
the interviewer were associated with a shift towards the lower class forms. 
Thus, in the face-to-face condition without the masking noise, only 13% of 
this subject’s ‘th’ variants were of the lower class forms; but when the 
masking noise was introduced, 25% were of the lower class forms. There was 
a similar increase from 24 to 31% when the noise was introduced into the 
“facing-away”’ condition. Changing from the face-face conditions to the fac- 
ing-away conditions was also associated with comparable increases in the use 
of the lower class forms. The author had not detected “his effect. Thus it is 
possible that a more refined analysis of all the tapes would reveal effects of 
the change in the visual condition of the interview, which the author did not 
observe in his more global assessments. 

Occasionally there was increased use of entire word-forms characteristic of 
less cultivated English speech. Subject 13, for example, uttered the vocative 
form “see” 14 times in his first masking condition of Interview I, after he 
had spoken throughout the preceding nonmasking condition without a single 
instance of this form. This differential use of “see” in the masking—non- 
masking conditions did not occur in subsequent interviews, where the form 
appeared at most twice in a given condition. Subject 6, a young woman born 
in this country of Jewish parents and reared in the Jewish community, 
showed the Yiddish feature of starting utterances with “so, ,” in con- 
texts where an American English apeaker might say “And .” but when 
neither conjunctive form is essential to the meaning of the utterance. In all 











PRESENT STUDY 345 


the masking conditions, she used this form 49 times; but in the nonmasking 
conditions the frequency was only 35, (x? = 2.33, p = .07 ¢,). Thus the 
frequency of this form increased 40% in the masking condition. This subject 
also responded twice to interviewer queries in one masking condition with 
the expressive introductions to her replies, ‘‘Don’t ask me.” In the same 
condition she also said, “So he said to me ‘what am I thinking about.’ ”’ Such 
striking idiomatic expressions never occurred in the nonmasking conditions. 
All these usages were among the cues that reminded the author of “Molly 
Goldberg's speech.” 

Prolongation. This was a phenomenon of intermittently increasing the 
duration of sounds. When it did occur it was usually at the end of phrases or 
sentences and consisted of protracting the phonation of single syllable words 
or the endings of longer words, as in the following example from the speech 
of Subject 13: 


I mean, she... ah... didn’t wanta be alooone. Like if... ah... my 
mother and father went out, why she would call them up where they were and 
say come on hooome. Y'know. I don’t like to be left alooone. Little things like 
that. 


‘The combination of prolongation and flattened intonation often imparted a 
marked singsong quality to the speech. 

Pitch. All but two subjects spoke at different pitch level in the masking 
conditions than in the nonmasking conditions. Nine subjects raised their 
general pitch level whereas six lowered it. Whenever a change in pitch 
occurred it was consistently in the same direction for a given subject. 

Vocal noises. When speaking with the masking noise all but three of the 
subjects produced guttural sounds. These were ‘“‘noises” in that they are not 
English phonemes and did not sound like the usual vocalizations—ah, uh, 
etc. Some were similar to “croaking” or “choking,” strangulation noises. 
These metaphorical terms might create the impression that the subjects 
were straining in an effort to talk when they produced the noises. That 
would be erroneous. They did not seem to be doing so to the interviewer. 
The noises could occur in the absence of any visible sign of effort and were 
not disruptive of speech. These noises occurred either during pauses or upon 
the onset of a word, phrase, or sentence. The following interview excerpts 
illustrate the positions and linguistic contexts in which they occurred. An 
“x” indicates the occurrence of a noise approximately the duration of a 
syllable. 

The following subject was unusual in that one of his noises replaced the 
word “she,” illustrated in the first passage, and some of his noises lasted 
several seconds as in the second passage. 


346 21. PEOPLE CAN’T HEAR THEIR OWN VOICES 
Subject 11 


First excerpt 


Interviewer: | wonder if you could give a picture of what she’s like? (i.c., 
subject's wife). 

Subject: Well, (omits “‘she”) . . . (xxxx) is wonderful, and a very good wife, and 
a very good girl, a good Catholic, uncomplicated. And. . . uh . . . (xxx) she’s 
not overly intelligent, but I mean in the affairs of everyday life she’s . . . (xx) 
well, she’s down to earth. And... uh... well, she’s... uh... 
uncomplicated, 


Second excerpt 


And...uh... I dunno what age they (i.e., niother and father) were when 
they got married. —(xxxx . . . xxxx . . . axxx)—. Anyway, we lived in B— for 
awhile, and I was born in B—, etc. 


Sturring. Unusually indistinct articulation occasionally occurred in the 
masking conditions. Only one subject showed increased slurring in all six 
masking conditions; he spoke the least distinctly of all che subjects without 
the masking noise as well. Six subjects slurred only in the third interview and 
four subjects did it for the first time in the second interview. Only two 
subjects slurred in the first interview. Thus this was a late-appearing phe- 
nomenon, which fact may account for its low incidence in Table 21.1. 

Rate. No sharp distinction was made between articulation rate when 
speaking and overall word-rate per unit time in judging rate of speech. As 
Table 21.1 indicates, some subjects spoke slower and some faster in a small 
number of the masking conditions. There was no relationship between the 
incidence of rate changes and slurring. 

The following tabulation shows that the male and female subjects dif- 
fered considerably in this effect of the masking condition. 


Kind of Rate Change 








Faster Slower None 
Males 0 6 2 
Females 4 1 4 


Phrasing. This term refers to the fragmentation of a continuous utterance 
into a discontinuous series of phrases demarcated by noticeable but brief 





PRESENT STUDY 347 


pauses. The net effect was the impression that the subject was speaking in 
phrases and not in sentences. 


General Behavioral-Psychological Changes. Affect expression. Fourteen sub- 
jects manifested greater affect in one or more of the masking-noise condi- 
tions than in any of their nonmasking conditions. The #ypes of changes ob- 
served included: variations in laughter, greater general spontaneity of affect 
expression, increased excitement, anger, and sensuality or erotism. 

In the masking conditions, the laughter was more frequent, of longer 
duration, louder of course, sometimes more paroxysmal and more erotic. 
Changes in laughter were largely characteristic of the female subjects, being 
apparent in seven of them but in only one of the men. 

In some cases, the change in affect expression was general throughout a condition. 
Male Subject 13, for example, characteristically sounded assertive and ag- 
gressive in the masking conditions but obsequious and somewhat effeminate 
in the nonmasking conditions. The lower upper class young woman, Subject 
4, characteristically spoke with greater vitality and spontaneity with the 
noise, causing the listener to be more interested in what she was saying and 
to find greater enjoyment in hearing her speak than was true when she spoke 
in the nonmasking conditions. 

At times, intense affect was expressed in relation to the personal content being 
discussed by the subject. Female Subject 9, who was usually quite pleasant and 
easygoing, sounded exceedingly angry and spoke in an extremely loud 
shouting voice as she recounted having been angered several years earlier by 
an unreasonable high school teacher. No comparable intensity of affect was 
observed in the nonmasking condition. 

An especially interesting increase in content-related affect was shown by 
female Subject No. 7 as the following interchange took place in one of the 
masking conditions. Just before this fragment of the interview the patient 
was speaking about her religious conversion at an Evangelical Bible Camp 3 
years ago (“vocal noises” also indicated by x): 


Interviewer: But did you have any kind of an emotional experience? 
Subject: Yes | did. 
Interviewer: What was that like? 


Subject: (Subject characteristically seemed excited when speaking with mask- 
ing noise. In this passage her excitement increased and her face started to 
become red.) Well, .. . ah. . . I’m very emotional anyway when it comes to 
sad things so I can’t say this is just a sad experience or a happy experience or an 
emotional experience. I feel as if the Holy Spirit really touched my heart and 
made me want to repent. And take Christ as my personal saviour. And. . . 


348 21. PEOPLE CAN’T HEAR THEIR OWN VOICES 


ah. . . ic brought tears to my eyes. But I don’t want to believe that this is just 
an emotional experience. I . . . because that’s not enough. 


Interviewer: Uhuh. 


Subject: Because if it was, it wouldn’t last. So it’s something that you (x) take for 
life. In other words, your eternal life begins right then, instead of beginning 
when you die. Begins right on earth. 


Interviewer: Uhuh. What did it seem like when you had that experience? You 
felt that the Holy Spirit had touched your heart? 


Subject: (With the following utterance, her excitement becomes very intense, 
face becomes redder, and toward the end of it her upper lip is curled back on 
one side.) Well, I felt like a very (x) insignificant sinner who had been forgiven 
by my decision. And I felt very happy. Extremely happy. Although it’s no bow! 
of cherries because (x) most anyone who has any type of ate ah... seriet 
religion, is usually persecuted in some way. And as a Christian, I know I'd be 
persecuted, In fact, I know I haven't really been persecuted enough, because I 
haven't stood up for what I should. That's one of the reasons I feel I have back- 
slidden—because I haven't stood up for what I really believe, in many 
situations. 

(As he listened to this psychotically flavored content and observed her increas- 
ing level of agitation, the interviewer thought he should determine her imme- 
diate capacity for “testing reality” and coping with her paranoid ideation. He 
felt somewhat alarmed and was considering bringing the experiment to a close 
for this subject.) 


Interviewer: Uhuh. How do you think the . . . ah. . . how are the Christians 
persecuted? 

Subject: (sounding quite surprised) Pardon me? 

Interviewer: You know, you said that you felt that you hadn't been persecuted 
enough. Can you explain that to me? . . . I'm interested in finding out. 


Subject: At first when I had this conversion experience and took Jesus as my 
saviour— 


Interviewer: Yeah. 
Subject: (x) A lot of people laughed at me, and 
Interviewer: Ubuh. 


Subject: And you know . . . not laughed, but at school you're different. You are 
pain a worldly person. And... ah. ... people noticed it. And now I feel 
as if P've become very worldly again. I'm not living the type of life I should, 
and. . . Not that I want people to laugh at me. I don’t. And I don’t think that 's 
the idea of becoming a Christian, just to be persecuted. I think as a Christian 
you— 


PRESENT STUDY 349 
Interviewer: Yeah. 


Subject: Spread the message and you show your love. You don’t go around being 
self-righteous. That’s not what I mean. But (x) I... I... I feel as if I... 
(x). . . I'm not standing up to what I really believe enough. 

Interviewer: (Feeling now that a crisis was past, perceiving the subject quieted 
as she progressively reassessed and negated the paranoid ideation.) Uhuh. 


That's interesting, I'd like to talk more about that in some of the other 
interviews. 


This young woman became excited in five of the six masking-noise condi- 


tions and on two other occasions her excitement reached a high level as she 
spoke about specific content: once when speaking of her boyfriend when she 
also laughed in a “devilish and libidinal” manner, and again when she 


uttered a loosely organized, symbolically toned, almost incoherent statement 
about her teaching aspirations. 


This subject’s MMPI scores were all within normal limits, but they 


showed the typical “schizophrenic cluster” on the paranoid, psychasthenic, 
and schizophrenic scales and these scale scores all fell very close to the upper 
limit of the normal range. Apparently her definite capacity for heated 
thought, psychotically toned both in form and content, springs from her 
general psychological status at the time of this investigation. In the non- 
masking conditions she frequently negated paranoid thoughts. Several 
times, for example, she said of her present life circumstances, “I’m not the 
victim of circumstances.” But in the masking conditions, the underlying 
psychotic affective and ideational tendencies episodically became more 
manifest. 


At times, the increased affect expression consisted of direct emotional responses to 


the interviewer. In evaluating the examples about to be presented, the reader 
should picture the interviewer through the subject’s eyes: he’s visibly 20 
years older, a stranger, and a gray-haired, pipe-smoking “professor” of psy- 
chology at a university that overshadowed the subject’s college. 


Some of the emotional reactions toward the interviewer were openly 


positive or thinly veiled erotic ones. Thus one of the young women, No. 1, 
spoke affectionately as she said the following in a masking condition in the 
third interview: 


Subject: (After a long description of her not completely satisfactory relationship 
with her boyfriend, including episodes of his being inconsiderate, she sud- 
denly laughed.) I'm thinking of you sitting over there (laugh). 


Interviewer: What are you thinking? 


Subject: | was just watching you smoking. You look so calm and relaxed. 


350 21. PEOPLE CAN’T HEAR THEIR OWN VOICES 


Interviewer: Well, what did you think? 


Subject: | was. . . 1 was thinking you looked so. . . ah... well, I guess I should 
use the word ‘understanding’ again. You look so understanding sitting there 
and, . . I just felt like talking. (laugh) I've nothing in mind to say, but you are 
the type of person that people can talk to. 


Interviewer: Thank you. 


Subject: You are really, (laughs). I'm not trying to give out compliments bur, I 
mean, it. . . (xx). . . I think that’s a wonderful trait for a person in your field to 
have because you really need it. 


Interviewer: \ like to talk with people. 

Subject: You can see you do. 

Subjects also expressed anger toward the interviewer. Thus a female 
subject, No. 5, suddenly and sarcastically spoke as follows in one of the mask- 
ing conditions: 

Subject: You're quite comical (laughing). 

Interviewer: What? 

Subject: You're quite comical (laughing). 

Interviewer: Why? Why? 

Subject: The position you have. 


Interviewer: Oh? Why? T. . . tell me about that. How did it seem comical to 
you? 


Subject: (Pause) Well, I don’t know (laughing), it’s just the idea that you're 
interfering. 


Interviewer: How. . . ? 

Subject: (interrupting) Can’t you hear me? 
Interviewer: Yeah. 

Subject: Oh, can’t you hear me? 

Interviewer: Yeah, yeah, sure. 

Subject: You're interfering in my train of thought. 
Interviewer: How? 


Subject: By pressing it, of course (laughing). [i.c., pressing the foot switch 
controlling the masking noise] 


Interviewer: (laughs) But you said my position was comical. What did you mean? 





PRESENT STUDY 351 


Subject: Yes. You're sitting there with a pipe, just like. . . ah. . . [don’t know, 
some English gentleman I suppose (laughing). 


Associative response. This term refers to the flow of utterances by the 
subject during the interviews. ‘Freer associative response” means that the 
subject talks more readily, a change, which he may manifest in several ways. 
He may say more in response to the interviewer's comments; he may re- 
spond more quickly to the interviewer; and the interviewer may find that it is 
not as necessary to ask questions and that simple acknowledgements that he 
is listening are all that are necessary on his side to maintain a continual 
stream of utterances from the subject. At times this effect of the masking 
condition was very impressive, for the subjects would continue talking at 
such length that the interview transcripts would run two or three pages 
consecutively with only one comment per page by the interviewer, and often 
none at all. 

Quantitative measures reflect more objectively the “freer associative re- 
sponses” observed in the clinical survey of the interviews. The interview 
typescripts followed the format of the interview excerpts that have been 
presented throughout this chapter and the page margins were fixed. Upon 
inspecting this format it is apparent that a count of the number of words 
uttered by the subject per page provides an index of the quantitative aspect 
of what we have called the “associative response.” Therefore, we tabulated 
the words per page under the masking and nonmasking conditions for each 
subject. These tabulations were carried out only for Interviews II and III. 
The inclusion of Interview I would have introduced a constant bias in the 
results because the nonmasking condition was always the first condition of 
that interview and at the same time it was always one in which the inter- 
viewer was most active. The interviewer's relatively high-activity level in 
itself would automatically lower the subject's page word count in the non- 
masking condition of Interview I. The effect of the masking condtion was 
clearly to increase the verbal productivity for the group as a whole (p <.00/); 
the average change in average words per page was 24%. 

One also gained the distinct clinical impression that a qualitative difference in the 
associative response occurred, as well as the quantitative change just re- 
ported. The subjects revealed fairly intimate, personal material in these 
interviews, and often to a greater degree in the masking conditions. The 
following instance illustrates this apparent phenomenon. During one of the 
masking conditions, a male subject, when asked to tell the interviewer about 
his family, told the following in detail. When he was a very young child his 
father, a factory worker, brought a male friend to his home. Soon this man 
became a roomer in the household. He and the subject’s mother became 
lovers and would have intercourse upstairs at night, while the father would 
become drunk downstairs. This home situation persisted for many years and 


352 = 21. PEOPLE CAN’T HEAR THEIR OWN VOICES 


resulted in one illegitimate child, a half brother of the subject. In the Inqui- 
ry, this subject said he preferred the masking condition of this interview 
because he didn’t hear the unpleasant things he was saying. 

Eleven subjects, six females and five males, spontaneously (!) referred during the 
Inquiries to the externally observable changes in associative response. Their com- 
ments revealed an awareness by them of both the quantitative and quali- 
tative effects of the masking conditions. Three subjects felt they “rambled” 
during the masking conditions. Eight subjects stated quite directly, although 
in various ways, that they felt less inhibited when talking in the masking 
conditions. 


One subject felt an “urge to confess” and felt less inhibited but, at the same 
time, “resistant at letting out all these intimate things.” 


Another said he had told something he had never told anyone before—i.c., his 
complex private feelings about the appearance of his acne-marred face. 


Two subjects felt they said “too much,” “revealed too much.”” 


One sensed an increase in her ability to recall memories of her past life and in 
the vividness of the memories. Another thought “maybe I admit things I 
wouldn’t ordinarily bear to think” in the noise condition. Still another subject 
said she had discussed things she ordinarily wouldn’t discuss. 


‘The final subject said that during the masking conditions he felt he had a lot to 
say and wanted to tell so much that there wasn’t enough time. 


Cognitive confusion. This category includes relatively minor variations in 
normal syntax, simple forgetting of what one was talking about, and disor- 
ganized sequence of utterances. 

The following excerpt from masked speech illustrates some of the minor, 
but unusual, syntactical variations. Subject No. 2 said: 


I took piano and he took accordion. And . . . uh . . . I guess I took about 8 or 9 
years lessons. 


The following interchange with another subject, No. 10, contains a more 
complicated syntactical confusion; it involves the underlined “quite often,” 
not the many normal sentence changes. The solid and dotted arrows are 
discussed later. 

The “quite often” occurs out of context but it was uttered without a break 
in tempo. Possibly the ‘quite often” migrated from the end of the preceding 
statement, as indicated by the solid arrow; the reconstructed, “She and I go 
to the beach quite often,” is meaningful. But it is also possible that it 
migrated from some earlier point in time, such as that indicated by the 
broken arrow. This reconstruction also “makes sense.” (The fact that the 








PRESENT STUDY 353. 


Interviewer: Do you see a lot of her? 





Subject: Well, I have more during the . . . since we’ve been out from 
school. Wh . . . when I went to school, I had one . . . I was in one class 
with her. But other than that 1. . . I saw her perhaps for 5 or 10 minutes ? 
a day. So the only time we've ever got to go out would be on a Fri- 


day or Saturday. So . . . however since the . . . uh . . . school’s been let 
out, I’ve seen a lot more of her. Wh... she... uh... she and I go 
to the beach We went 


{ ? 
to the beach yesterday guite 
(lower volume) 
going to go to X_____ and maybe I’ll see her... uh. . . well, Saturday, 
maybe tomorrow night. 








ite often” was uttered at a lower volume level suggests another pos- 
sibility: that it was the subject's intention to inhibit these words. The basis 
for this speculation becomes clear when more definite examples of “thinking 
aloud” are examined later.) 

There were many instances in the masking conditions where sudjects Jost 
their train of thought, forgot what they were talking about, and simply acknowl- 
edged that fact and fell silent. Sometimes the subjects coped with their loss 
of train of thought by repeating earlier words or phrases. They reported this 
device enabled them to proceed with only a minimal amount of confusion. 

Sometimes the loss of train of thought was obviously due to the sudden intrusion of 
other trains of thought. Vacillation between different lines of thought often 
produced a series of utterances confusing to the listener. This happened in 
the following interchange, at the end of which the subject (No. 11) picks up 
the original train of thought. The instances of “cognitive confusion” caused 
by intruding thoughts are underlined and asterisked. Except where indi- 
cated the subject is speaking very loudly. 


Interviewer: Mbm. What kind of things do you do together? 


Subject: Well, before the baby came we just. . . ah. . . went to the movies, 
played cards. . . ah. . . went swimming, went for rides, and. . . ah just spent 
our time together at home, xx (vocal noise) and television. And. . . ah. . . now 
we don’t get to go out much. We spend most of our nights at home watching 
television or . . . she’ll help me study fora test, .. . but. . .ah . . . our social 
life is pretty limited now. 


Interviewer: Get along okay together? (i.c., subject and his wife) 


* Subject: We get along very good. I mean... ah. . . I get along (much lower 
volume)... ah... we get along fine. Imean... ah... she’s alittle. . . 
ah. . . gets a little tired and cross at times because she has to... .ah. . . stay 








354 21. PEOPLE CAN’T HEAR THEIR OWN VOICES 


in, I mean she doesn’t like to go out all the time, but she likes to meet people. 
She’s a very friendly girl. She makes friends with . . . ah. . . everyone she 
meets. And she gets a little lonely because she’s . . . for all of her life she’s 
worked in an office and had a lot of girl friends. And she comes from a big 
family, so it gets a little lonely at times for her. And I'm not financially . . . 
or...ah... ah... [don’t have enough time *to... ah . . . go out dancing or go 
to the... movies. . . or spend as much time away from my studies as I would be if I 
wasn't married. But... ah. . . xxx (vocal noise). . . as far as doing * things 
together... ah. . . getting along, that’s it, getting along (silent laugh)... ah... 
we get along in an uncomplicated way. 


In the last passage, the subject spent most of his time talking about his wife's 
feelings. As he was doing so, however, statements referring to his own 
sentiments intruded out of context on two different occasions. And as he 
spoke of how he and his wife got along together, a statement about their 
doing things together intruded, apparently being the product of the perser- 
veration of the line of thought relevant to the preceding passage. The subject 
himself showed awareness of his cognitive confusion in the last utterance of 
the excerpt. 

Another type of cognitive confusion consisted of an exceptional degree of 
fragmentation and syntactical disorganization of utterances, such as occurred in 
the following excerpt. 


Interviewer: What was it about her (i.¢., a former teacher) that you liked? 


Subject No. 7: Well, x (vocal noise) at the time I. . . I didn’t really know what a 
good teacher was and what a bad teacher . . . their teaching methods. She 
seemed to be a fairly good teacher, in . . . in the actual teaching, and yet she 
was a very nice person. She was a person to be expected . . . respected, and 
she’s a person that... ah... in awh... She inter . . . within her teaching 
program... ah... within teaching us she... ah... intramingled. . . 
ah... the teaching of democracy and teaching of high principles for us as 
citizens. Yet it didn’t sound corny, it sounded very good, and I think x (vocal 
noise) that’s one reason I like her, because I respected her so much as a person. 





The subject remembered this episode in the inquiry at the end of the 
interview. She felt she needed help putting the words in the right order. 
In all, 14 subjects reported experiencing “cognitive confusion” at some time 


during the masking conditions. Twelve subjects experienced losing their train 
of thought or forgetting what they wanted to say; nine subjects reported a 
sense of having difficulty in expressing their thoughts such as finding the 
right word, keeping their words in the right order, or coherently organizing 
their thoughts. 

“Thinking aloud.” \n this rare but striking event, subjects said aloud things 
that they were thinking but apparently were not aware of and/or did not 





PRESENT STUDY 355 


intend to be audible. One instance occurred in the following interchange, at 
the asterisked point. The subject, No. 1, is speaking of a recent event: her 
boyfriend did not take her to a veterans’ group party after all. 


Subject: Well, it’s strange. All along I assumed that I was going. He kept saying, 
“T’m not taking anyone to this.” He kept saying, “I . . . | wouldn’t take a dog 
there” (laughs). He said, “I don’t like the locality, I don’t like the neigh- 
borhood.” It’s a very tough neighborhood. And he . . . he kept saying that he 
thought it would just end up into be a . . . being a beer brawl. And he said he 
didn’t want me around. But I thought he was kidding all the time. And so up 
until Friday I thought I was going, even though I wasn’t. (laughs) So I talked to 
him Friday and it became more clear to me what was happening, because he 
said he was going to give out drinks all night. And . . . ah. . . so I realized that 
he wasn’t just fooling around. In the beginning though I just thought he was 
just joking. Because we go every place together. 


Interviewer: Ubuh. 
* Subject: And, -?-?-?-? (incomprehensible, low-volume mutter) 
Interviewer: What did you say then? 


Subject: What did 1 say then? Oh, I was just thinking about the party. He was 
telling me. . . 


Interviewer: What . . . what were your thoughts? 
Subject: | was thinking about the beer party. 
Interviewer: Yeah. 


Subject: 1 was just thinking about the beer party. Because he said that they were 
nine deep at the bar and he said he couldn’t give out the beer fast enough (said 


laughing). 


The incomprehensible, low-volume mutter occurred at a time when the 


subject was “just thinking.” To that extent, she thought out loud. 


Early in the first interview, a 27-year-old male subject (No. 16), a Jew, 
said, “And my father’s been dead for 20 years,” while speaking without the 
masking noise. The next reference to “father’’ occurred in the following 
interchange when the subject was speaking under a masking condition of the 
same interview. 


Interviewer: When was this that you lived in X____ and Y. 2 


Subject: lived in X____ when . . . I musta been about 1 or 2 years old. Then 
we moved to Y. and... ah... lived there for 2 years. And then my 
father got a position in Z____, and ah. . . we moved to Z. . And 
that’s where he died. 














356 21, PEOPLE CAN’T HEAR THEIR OWN VOICES 


Interviewer: What kind of work did your father do? 
Subject: Well my father, av’sholom, was a rabbi. 


Interviewer: Mhm. Ah... you sa... you said a word there. You said, 
“My... 


Subject: Oh. Ah. . . that means... ah... “may he rest in peace.” In. . . 
in...when...in...Je.., when you're Jewish, whenever you speak of a 
dead person, you always say that—which means “may he rest in peace.” 


‘The subject had not, however, said ‘“‘av’shalom” as he spoke earlier of his 
father, even of his father’s death, when he could hear himself. Nor did he 
ever use this expression again when speaking upon several occasions about 
his father. 

His father, who died when the subject was 7, was “an extremely Ortho- 
dox Jew.” The subject regarded himself as a Conservative Jew, for he kept a 
Kosher home and observed the Holidays. In the third interview, the subject 
was aware of thinking but not saying “‘av’sholom” as he spoke of his dead, 
Orthodox aunts. As he thought about the incident of the first interview he 
realized that he did not use this expression when speaking with someone 
who was not a Jew, and that upon meeting the interviewer he had pre- 
consciously categorized him as not Jewish. Finally, the verbatim extract 
shows how the subject's speech became very “flustered” when the inter- 
viewer asked about the use of “‘av’sholom.” These facts suggest that this 
subject always said “‘ay'’sholom” when speaking of the dead, but to himself 
when speaking with a non-Jew, and that he only spoke it aloud with the 
interviewer because of the influence of the masking condition. 

The next-to-last example of cognitive confusion contained an instance of 
“thinking aloud” that is more complicated that the preceding ones. It is the 
utterance out-of-context, and at low volume, of the thought fragment, “I get 
along.” (see p. 353) The schematization of Fig. 21.1 illustrates what is 
assumed by the author to have happened after the interviewer asked: “Get 
along okay together?” The intended statements, those at the top, are unified 
both syntactically and acoustically. The “I get along” is a syntactic and 
acoustic foreign element. It seems to belong to an inhibited, conflicted line 
of thought, concurrent with the intended statements. The instigation to 
utter the inhibited line of thought seems to have become stronger with the 
passage of time. This produced another conflict, one between uttering the 
intended statements and those belonging to the inhibited line of thought. 
The conflicts are manifested in the pausing behavior and the conjoined “I 
get along.” From the subject’s standpoint, the conflicts were resolved in 
favor of his intended statements. For the moment only, however. A reread- 
ing of the entire passage excerpted earlier will show that as the subject 





PRESENT STUDY 357 


VOICE VOLUME 
Subject responds 
by saying 


Subject's general 


si it 
pert ibid, 'We get along very 











F fi 
aap pees good. I mean... oh. ah....we get along fine 
AOI) pitted) tenn tuyere lips SIRIUS ey Spree recede kay 
THESHOLD 

Subject also responds 

by thinking but 

inhibiting something 

unknown fo us 

PiP2 PLL Le Pied ide? 22 PPP 
TIME 


FIG. 21.1 Schematization of processes hypothesized to account for subject's 
thinking aloud, “I get along,” which is spoken at a barely audible intensity 
and out-of-context. 


continues he speaks of how his wife feels. But then there occurs the state- 
ment: 


And I’m not financially... or... ah... ah. . . I don’t have enough time 
to...ah.. . go out dancing or go to the . . . movies . . . or spend as much 
time away from my studies as I would if I wasn’t married. 








which is completely unrelated to what he has been saying, causing the 
listener to be confused. It seems quite possible that this statement is a result 
of, and thus further evidence of, the private inhibited line of thought we 
have inferred to be accompanying the public, intended statements. 


Changes Over Time of Externally Observable Effects of Masking Noise. The 
precise determination of changes over time was not attempted in this study. 
This could only be done through a great deal of additional scrutiny of the 
tapes where this determination was the sole purpose. Nevertheless, the 
writer did form some definite impressions. Those pertaining to the degree to 
which effects were sustained during individual noise conditions were pre- 
sented in the discussion of the individual effects. Two further impressions 
were formed. 

Immediacy of Speech Changes. Speech changes dramatically the moment the 
masking noise is administered. Colleagues and audiences who have heard 
excerpts of the tapes have been struck by this fact. Immediate increased 


358 21. PEOPLE CAN’T HEAR THEIR OWN VOICES 


loudness, flattening of intonation, and changes in “voice style” are es- 
pecially striking. The second generation Italian—American, male subject 
(No. 13) who has been mentioned (p. 343) illustrates the immediacy of 
change very clearly. Before the masking noise was first administered he 
spoke softly and precisely and sounded effeminate and obsequious. The 
moment the masking noise was administered he spoke loudly, with flattened 
intonation, and sounded harsher, more masculine and aggressive. And his 
lower social status dialect characteristics immediately became noticeable. 
Before the masking he sounded like an “overly refined young man”; the 
moment the masking occurred he sounded like a “tough kid.” The masking 
noise was deliberately turned on and off for brief periods in one of the noise 
conditions of the third interview to obtain a record that would demonstrate 
the immediacy of its effects. His two styles of speech changed regularly, as 
though they were being switched on and off. 

In all the subjects, the return to normal speech with cessation of the 
masking noise was just as striking as the masking-induced change. Occasion- 
ally, a definite transition period of about 30 seconds could be observed. One 
got the impression in these instances that the subject was in the process of 
regaining his auditory feedback, as though his auditory threshold was de- 
creasing over a brief period of time. 

Adaptation. The category showing the most noticeable evidence of adap- 
tation over the three interviews was loudness. Subject 17 was even able to 
talk in Interview III with only a little difference in volume in the masking 
and nonmasking conditions. He had set volume control as a definite goal for 
himself. Although he finally approached it, he didn’t completely achieve it. 

Generally, however, signs of adaptation were not remarkable nor con- 
sistent. By the end of the third interview a few subjects gave a general 
impression of some adaptation, but most did not; and a few subjects became 
increasingly affected by the masking conditions. Furthermore, trends over 
interviews were not always in the same direction in all categories. Some 
subjects who showed adaptation in loudness, for example, made more vocal 
noises in the third interview than earlier. The late occurrence of slurring was 
noted previously. 

Subjective Experiences in Masking Conditions 

The subjective report data were sometimes congruent with the externally 
observable behavior discussed earlier and sometimes they were not. Gener- 
ally speaking, the subjects’ reported experience of the masking conditions 
was negative, more so initially than after the first interview. Some important 
exceptions are cited later. Also, the inquiry data is compared with the im- 
pression formed by the interviewer of the subject’s general reactions to the 
masking conditions. 





PRESENT STUDY 359 


Fifteen of the 17 subjects felt it was easier to talk without the masking 
noise in Interview I. Nine subjects felt this way in the third interview. The 
major reported aversive qualities of the masking conditions included the 
noise itself, the physical effort of talking in its presence, the inability to hear 
oneself, the interference with cognitive processes of “keeping one’s train of 
thought” and verbal articulation, a sense of loss of contact with oneself or the 
interviewer, and negative affects. It is not surprising that nearly a third of the 
subjects felt angry during the masking conditions of the first interview. The 
aversive qualities were especially marked at that time; nearly all of them 
were reported considerably less frequently after the first interview. Corre- 
lated with this change was the report of considerable adaptation to the mask- 
ing condition by every one of the subjects and an increase from two subjects 
at the time of the first interview, to eight in the third interview who did not 
find it more difficult or unpleasant to talk with the masking noise. This is the 
general outline of the reported subjective experiences in the masking 
conditions. 

The overall picture conflicts in two important ways with the impression 
produced in the interviewer by the overt verbal, vocal, and general behavior 
of the subjects. First, the subjects did not appear to be “suffering” as much 
as is suggested by their reports, even in the first interview. In fact, some of 
them seemed to enjoy the experience even in Interview I and certainly in 
the investigation as a whole. The interviewer was more struck with how 
easily the subjects talked than with initial or episodic disturbances in talking. 
As a group, the subjects impressed the writer with their involvement in the 
interview transactions and in the experiment. Nor a single subject postponed, 
arrived late, or missed an appointment. Secondly, there is no agreement be- 
tween the “objective” signs of adaptation and the subjective sense of it on 
the part of the subjects. Although every subject reported considerable adap- 
tation to the noise as a stimulus and to the lack of auditory feedback, few 
subjects gave objective evidence in their speech behavior of progressively 
and extensively adapting. In fact, four of the subjects distinctly appeared to 
be more affected in the second or third interviews than in the first. Yet one of 
these said it was easier to talk with the noise than without it in Interviews II 
and III. 

Turning now to illustrate and examine in more detail the negative subjective 
experiences, we begin with the stimulus properties of the noise itself. The 
noise reminded the subjects of rushing air or water, various mechanical 
sounds, and of radio or telephone static. When the noise was experienced as 
a noxious stimulus, as it was for fourteen subjects in Interview I, it was 
primarily because of its intensity and of its intrusive quality, both of which 
preempted the subject’s attention. The following paraphrased comments 
illustrate this kind of reaction to the noise in the first interview. 


360 21. PEOPLE CAN’T HEAR THEIR OWN VOICES 


It was a hostile sound coming in on you. It wasn’t welcome. It was distracting 
to have that noise going around in my ears. It was like this was inside my head. 
When I start talking my brain doesn’t grasp onto the thought and right away the 
noise becomes the most prominent thing. 


Some subjects, six in Interview I and two in Interview II, complained that 
talking with the masking noise was effortful. Some of their comments were: 


You had to force your thoughts to take place and to express yourself in words 
rather than just talk naturally. The first time I was worn out after I left. I was 
too pooped to pop. 


Not every subject complained about the fact that he couldn't hear him- 
self, but 12, 8, and 2 did so in Interviews I, II, and III respectively. 

Difficulty in cognitive organization and syntactic expression was reported 
nearly as often as the noxious properties of the noise in Interview I and more 
frequently in Interviews II and III, Subjects often attributed this difficulty 
both to the intrusive-preemptive quality of the noise and to the feedback 
deficit, as the following paraphrased reports demonstrate. 


Subject No. 7:1 felt as if | couldn’t express myself in any way possible because I 
couldn’t hear myself. I wasn’t sure I was using the right words, verb forms, and 
sentence structure. And I think I was more inclined to forget what I had just 
said. 

Subject 9, Interview I: In the beginning it really distracted me. I couldn't think 
of what I was saying. I had a sensation that I wasn’t really speaking. I almost 
felt that I was thinking of this. (i.e., instead of speaking). That was when I 
coughed and realized I was speaking. I wasn’t quite sure of what I was doing. 
Interview II: 1 didn’t find the noise as distracting this time as last time. I still 
had to fish around a few times for a particular word I wanted. Maybe it’s 
because I can’t hear it and I don’t know whether I’m saying it right or not. 
‘That's why with that word “‘sict”—I was trying to say “afflicted” and I didn’t 
know whether it was going to come out right so I just didn’t say it. 





Some subjects reported what we have come to call “Loss of Contact with 
Self or Interviewer” during the masking condition. This experience was most 
frequently reported in Interview I and when the subject was deprived of 
both auditory feedback from his own voice and visual contact with the inter- 
viewer. The following paraphrased remarks indicate the nature of the 
experience. 


Subject 1, Interview I: It’s harder to talk with the noise. But it was easier when 
you were in front of me than behind me, because at least there was some 
contact with something—something you could see, something concrete. 





PRESENT STUDY 361 


Whereas when you're behind me it’s very astract. I felt there was no one there 
to talk to really even though I realized you were behind me. 


Subject 8, Interview 1: With the noise I felt like I was talking to myself. I talk but 
I don’t seem to have a part in what I’m saying. It’s even funnier facing away 
from you. It’s like talking to no one; it’s like nobody’s there. Interview II: It's a 
Jot harder to talk with the noise on. You don’t know how you're saying things; 
you don’t even know what you're saying; you sort of forget things; you don’t 
know if you did or didn’t say something. It leaves you with an empty feeling. 
And the noise seems to push you back. It seems to put you out of reality in a 
way. It feels like you're talking through clouds or something. 


Subject 10, Interview 1: 1c made me feel like I was sitting in a big room and the 
room was so big that as I was talking I wasn’t getting any rebound off the walls 
and my ears. I was just talking into emptiness. 


Subject 11, Interview 1: \c took me away from reality. It was like being separated 
from everything, being in some kind of ether. I mean some never-never land, 
It was more subjective. I was all alone with the sound. (In B-N condition.) 


All of the preceding negative aspects of the masking-noise condition were 
stated or implied to be causes of general discomfort for the subjects. Except 
for the physical effort required in talking, these negative aspects were also 
the causes for the more specific affects of anxiety and tension, and for anger 
as well. It is worth noting that the reports of anxiety and tension were the 
only ones of the “negative reactions” that did not decrease from Interview I 
to Interview III. 

The general decrease in frequency of negative references about the mask- 
ing conditions was associated with reports by every subject of considerable 
adaptation by the third interview. Even the nine subjects who stated that it 
was easier to talk without noise than with it in the third interview reported 
considerable adaptation by that time. 

Six subjects specified certain “positive” features of the masking conditions. And at 
one time or another during the interview series, four subjects preferred 
talking in the masking condition whereas five others reported having no 
preference between the masking and the unmasking condition. For seven of 
these nine subjects, the masking condition achieved its preferred or neutral 
status only in Interviews II and III, indicating the importance of experienced 
adaptation. 

A most unusual and informative case was Subject 5, a young woman, who 
from the first interview preferred the masking condtions—even though she 
found the noise itself a noxious stimulus in that interview and complained 
that it interfered with thinking and verbal expression. She insisted that she 
didn’t like to hear her own yoice and felt more relaxed and at ease when the 
masking noise prevented her from doing so. Her singular reaction is con- 





362 21. PEOPLE CAN’T HEAR THEIR OWN VOICES 


sistent with the total picture that emerges from observing her speech style 
and considering it in the light of her social context and personal history. This 
young woman persistently tried to speak the dialect of the American stage 
and theatre. Her normal speech style reminded the interviewer from the 
beginning of Bette Davis’—in caricature. She spoke with exaggerated artic- 
ulatory movements and strenuous attempts at acoustic precision. Yet she 
came from a lower middle class Italian—American background. The total 
impression was one of affectation. This configuration of factors indicates that 
she was concerned with how she sounded. One determinant of her speech 
style was an avowed aspiration to be an actress. Another, fully conscious, 
determinant was an “obsession for perfection” in action and speech, which 
she attributed to her mother’s perfectionism and to her own sense of worth- 
lessness. Her relevant remarks, paraphrased and condensed, were: 


Subject: My mother required perfection. She always reprimanded me for the 
way I acted or the way I talked. My sister was a year older and she mastered 
things before I did and spoke better. And before I knew it, nothing was quite 
right anymore in the way I walked, the way I talked, or the way I did things. 


Interviewer: Why don’t you want to hear your voice? 


Subject: | have an obsession for perfection. I don’t like my voice. I don’t feel it's 
good. It comes from my mother pestering me all these years about the way I 
speak, about my voice and my diction. 


The subject also reported that when she talked with the masking noise she 
felt more relaxed generally, and her mouth seemed to move more freely than was 
true without the noise. 

We infer the following from these data. The subject had internalized her 
mother’s observing and critical functions. When she spoke under normal 
conditions she was continually trying to be “perfect” and was continually 
listening to her speech to see if it was “perfect,” that is, as good as her rival 
older sister’s and as good as her actress ideals. She judged it negatively but 
continued trying to speak perfectly. Tension and effortful speech were some 
of the results. The caricature that resulted may also have had a hostile 
component directed toward her mother, for even her “perfect” speech an- 
gered her mother. When the subject couldn’t hear herself speak, she was 
spared both the sense of worthlessness prompted by what she felt were 
“imperfect” qualities of her voice and diction and the stenuous attempt to 
overcome this low self-esteem. Now she could relax and articulate more 
freely. 

One might expect this subject, normally so self-conscious about her 
speech, to become anxious instead of comfortable when she couldn’t hear 
herself speak because she then had no way of knowing if she sounded 


PRESENT STUDY 363 


“perfect” or not. The reason this did not happen may be related to her 
report that talking with the masking noise, while facing away from the 
interviewer, “was close to sitting and daydreaming.” This was another 
Positive aspect of the masking condition for her. Talking in the masking 
condition was not like talking, it was like being half divorced from reality in 
fantasy, when she would have no reason to be concerned with how she 
sounded. Apparently the capacity of the masking condition to decrease con- 
tact with reality and herself brought positive relief from such concerns to this 
subject, not discomfort, even panic, as it did in the instances of negative 
reactions discussed above. 

Subject 14 was also unusual in that he had no preference for masking or 
nonmasking in any of the interviews. He enjoyed hearing the noise, at least 
in Interview I, because it had a hypnotic quality for him. He compared it 
pleasantly with static he used to pick up with his short-wave radio as a boy 
that he used to fantasy was the sound of the surf on some island shore far off 
in the ocean. He experienced talking in the masking condition in Interview 
II as a comfortable state akin to fantasy. He said it was “like musing” and 
that then the interviewer became just part of his “own imagination.” Appar- 
ently these positive elements were balanced by negative ones, some of 
which he specified. He felt he skipped syllables occasionally with the noise 
and “felt a spasm of resistance at letting out all these intimate things” which 
he attributed to “an urge to confess” in the masking conditions. 

Other subjects reported positive experiences similar to those of the pre- 
ceding two subjects. Subject 1 preferred the masking-noise conditions in 
Interview III because she felt as though she were “escaping reality,” and she 
felt “more secure” because she experienced “‘less fear of criticism’ for what 
she said when she couldn’t hear herself. Subject 11 reported greater comfort 
with the noise in Interviews II and III. He specified in Interview II that with 
the noise he didn’t have to hear unpleasant things he said, by which he 
meant certain personal details of his life causing him shame and embarrass- 
ment, which he felt an inner compulsion to tell. His comment in the Inquiry 
of Interview III, “the sound is an old friend,” implies that the same sense of 
security was produced by the masking condition in that interview also and 
was a reason for preferring that condition then. 

Subject 8 experienced increased visual vividness of, and greater ac- 
cessibility to memories of her childhood in the noise conditions of Interviews 
IL and III. She seemed to enjoy being thus “pushed back by the noise,” 

Seven subjects reported an interesting phenomenon at one time or an- 
other: the intensity of the noise seemed lower when they talked than when 
they were silent. We will call this the “attention phenomenon.” 





Subject 5: You can deafen the noise with your own thoughts. 


364 21. PEOPLE CAN'T HEAR THEIR OWN VOICES 


Subject 8: When the noise was on, you sort of wanted to keep talking so you 
wouldn’t hear the noise. You couldn’t hear yourself anyway, but at least you 
had something to think about when you were talking. 


This phenomenon raises a question about the mechanism responsible for the 
increased verbal output in the masking conditions. In our choice of the 
category label “freer associative response,” we implied that a mechanism of 
disinhibition was responsible for the increased output. The phenomenon 
now discussed, especially as formulated by Subject 8, raises the possibility 
that the increased verbal output may occur because speaking is rewarded by 
decreasing the aversiveness of the masking condition. The following com- 
parison of the increase in verbal output in the masking conditions for those 
seven subjects reporting this phenomenon with that for those subjects not 
reporting it bears on this question. These data are inconsistent with the 
avoidance hypothesis. 


Mean % increase in verbal 
output in masking conditions 
of interviews II and III 





Subjects reporting 


attention phenomenon 17% 
Subjects not reporting 
attention phenomenon 28% 
p = .20 (t,) 
Discussion 


Many striking changes were observed in the speech and in the more general 
behavior of our subjects during the masking conditions. The fact that the 
subjects were as a group quite intelligent and of high-verbal skills perhaps 
makes the changes all the more impressive. We do not claim to have ob- 
served all the effects of the experimental condition. This survey only deals 
systematically with those effects that originally caught our attention as we 
conducted the interviews and screened the tape recordings. Others ap- 
proaching the interviews with different perspectives might find additional 
effects of the experimental manipulations. 

In the following discussion we consider the reliability and generality of 
these observations, the probable cause of the changes associated with the 
masking conditions, the possible role of feedback in “ego-functions,” and 
some miscellaneous ideas and questions suggested by this study. 


PRESENT STUDY 365 


Partial Confirmation in the Work of Others 


How reliable are the observations of the linguistic and psychological 
changes associated with the administration of the masking noise? This ques- 
tion arises because many of the observations consist of the “subjective” 
judgments and inferences of a single observer. How genera/ are the presumed 
effects of the masking noise? Are they unique to these subjects, to the 
interview situation involving this particular interviewer, etc? Although we 
are aware of only a few other studies of completely masked speech, their 
results and ours appear to be very similar. 


Changes in Speech. Several investigators have noted various changes in 
the speech process, Shane (1955) and Cherry and Sayers (1956) found that 
stutterers spoke much more fluently when their speech was masked by 
noise. Birch (1956) and Birch and Lee (1955) observed a similar change in 
the disturbed speech of expressive aphasia. These findings are comparable 
to ours in showing that speech in general is altered by masking. They also 
bear on our observations of a possible disinhibition process, as is discussed 
later. 

Most of the specific categories listed in Table 21.1 as “linguistic changes” 
have been observed by others: increased loudness, flattened intonation, and 
prolongation of syllables have been reported by Wood (1950) and Klein 
(1965). Wood, Klein, and Shane also observed changes in “voice quality.” 
Also, Wood noted increased pitch; Klein and Shane, slurring; and Shane as 
well as Holmes and Holzman (1966), changes in rate of speech. The only 
“linguistic” categories in Table 21.1 not reported by others are those of vocal 
noises and more distinct phrasing, although the latter may very well be a 
manifestation of “increased editing” observed by Klein in some subjects. 
These two phenomena were so apparent when they occurred that the lack of 
confirmation by others is not regarded as a serious matter. Wood noted, as we 
did, that the most varied speech changes occurred immediately, or nearly so, 
upon the onset of masking. 


Behavioral-Psychological Changes. For the purposes of this discussion we 
group together the externally observable changes listed under this heading in 
Table 21.1 and the subjective experiences reported by the subjects in the 
inquiries. Here, too, most of our important basic observations have also been 
noted by others. 

Friedhoff, Alpert, and Kurtzberg (1962) investigated the expression of affect 
in voice intensity when subjects were required to lie repeatedly to the exper- 
imenter. The intensity of these subjects’ voices did not change upon lying 
when they could hear themselves. But it did change if the subjects could not 
hear their own voices because of a masking noise. Upon lying, under this 


366 = 21. PEOPLE CAN’T HEAR THEIR OWN VOICES 


condition, the voices consistently became louder or softer, depending on the 
individual concerned. If one regards these vocal changes as manifestations of 
affects associated with lying, then these findings are comparable to our 
observation of increased affect expression in the masking conditions of the 
interviews. 

Our associative response category referred to both the quantity of talking 
and a variety of qualitative attributes, such as the degree of spontaneity, of 
defensiveness, and the readiness to reveal intimate and personal material. 
We judged from external criteria that the associative response was freer 
during the masking conditions. Certain of the subjects reported that they 
experienced this change. Klein (1965) observed similar phenomena when 
subjects were asked to respond to paintings, Rorschach cards, and stimulus 
words under normal hearing and masked auditory-feedback conditions. The 
introduction of a white masking noise increased the quantity of responses, 
the vividness of imagery, and the number of drive-related contents in the 
imagery. The subjects experienced their imagery as livelier and more vivid. 
Holmes and Holzman (1966) asked subjects to tell about a very embarrassing 
experience using only a nonsense language of their own invention instead of 
English. Under white noise masking, the subjects spoke in significantly 
longer utterances and tended to begin speaking sooner than under normal 
conditions. Thus the associative process became freer in both of these 
studies. 

Although no investigator has reported observations exactly like our shink- 
ing aloud category, as far as we are aware, the fate of English words that the 
subjects of Holmes and Holzman had to inhibit and translate into nonsense 
language was that they were frequently “thought aloud” in our terms; that 
is, more English words crept into the accounts spoken in nonsense language 
during the white noise than during the normal condition. 

Cognitive confusion phenomena, apparently quite similar to those observed 
in and experienced by our subjects, were also noted by Klein. 

Increased affect expression, freer associative responding, “thinking 
aloud,” as well as the preference of some subjects for speaking with the 
noise when they couldn’t hear their (to them) unpleasant voices and the 
distressing things they were saying, were among the reasons for our hypoth- 
esizing (Mahl, 1960) that disinhibition took place in our subjects under the 
masking conditions. We discuss this hypothesis in more detail in a moment. 
We mention it here because the studies by Klein (1965) and Holmes and 
Holzman (1966) and others, bear on it. The quantitative changes reported in 
the first two of these studies are consistent with this hypothesis, as are the 
qualitative changes in the responses of Klein’s subjects. So is the difference 
in English word usage observed by Holmes and Holzman (1966). 

Stanton (1968) studied a more familiar form of disinhibition. He asked 
subjects to utter as many taboo words as possible to standardized listeners 


PRESENT STUDY 367 


under masking and normal conditions. The subjects said many more such 
words when they could not hear their own voices. The majority of Stanton’s 
subjects expressed a preference for uttering taboo words under the noise 
condition. But we cannot we sure if this finding is comparable to similar 
reports by some of our subjects, for Stanton’s control subjects who never 
experienced the noise condition expressed the same preference as did his 
experimental subjects. 

If one assumes that the subjects of Friedhoff, Alpert, and Kurtzberg 
(1962) were attempting to conceal vocal clues of their lying, then the ap- 
pearance of changes in voice intensity with the introduction of a masking 
noise is also a case of disinhibition. And so is the increased fluency of 
stutterers observed by Cherry and Sayers (1956), if one assumes that speech 
inhibition is the crucial factor involved in stuttering. 

In short, there are several results from other studies in addition to the 
present one that are compatible with the inference that an underlying pro- 
cess of disinhibition occurs when people speak during the masking-noise 
condition. Why disinhibition might occur is considered in a moment. 

Three other findings remain to be checked against the results of others: 
the loss of contact with self or interviewer, the attention phenomenon, and 
adaptation, both observed and experienced, to the masking condition. 
There have not been reports on the latter two phenomena. Klein’s (1965) 
preliminary report suggests that some of his subjects, too, felt a change in 
sense of self and reality. Thus, he notes that some subjects reported uncer- 
tainty about what they were saying and he comments on a “feeling of isola- 
tion” that may have been induced partly by the masking noise and partly by 
the physical isolation of his subjects in a darkened room. On the whole, there 
is less confirmation of our findings concerning the subjective experiences 
than of the externally observable changes in linguistic and more general 
processes. This may be due to the fact that the other investigators have not 
studied the subjective experiences in great detail. This appears to be the 
case. 

The fact that most of our basic observations have also been reported by 
others is important on two counts. First, it indicates that these observations 
were reliable, for in effect ours were independent ones. Until several years 
later, we did not know of any of the studies reviewed except that of Cherry 
and Sayers (1956) and those of Birch (1956; Birch & Lee, 1955). The papers 
by Wood (1950) and Shane (1955) were published in obscure places; the 
others were reported after this study was completed. Second, the basic 
phenomena listed in Table 21.1 appear to have a high degree of gener- 
alizability across samples of people and various speaking situations. Most of 
the linguistic changes, for example, have occurred if the subjects read aloud 
as they did for Shane (1955) and Wood (1950), or if they spoke spontaneously 
as they did for us and for Klein (1965). Furthermore, both linguistic changes 











368 = 21. PEOPLE CAN’T HEAR THEIR OWN VOICES 


and the rather striking, more general psychological changes have not only 
been observed in the interview situation used here, but in the various situa- 
tions used by Holmes and Holzman (1966), Klein (1965), and Stanton 
(1968), each of which was markedly different from the others. It is quite 
obvious that some genuine phenomena have been observed in these studies. 


Cause of the Phenomena: Noise Input andlor Feedback Deficit? 


This question raises many theoretical alternatives. To begin, the masking 
condition introduces two distinct elements: the stimulus input of the noise 
and the deficit in feedback of auditory cues from the subject's own speech. 
Either clement might produce one or more different internal states: (a) The 
stimulus input, for example, might increase the general level of activation or 
arousal, or it might specifically stress the subject. Or the noise input might 
cause the subject to automatically make an erroneous unconscious inference: 
that the interviewer also heard the noise. After all, in his previous experi- 
ence, any loud noise that interfered with his hearing also affected others 
present and required loud speech; (b) The significance of the feedback 
deficit, for example, might consist simply in the fact of the deficit, of there 
being something missing that is essential for normal or customary function- 
ing. Or the deficit might be a stressor because it disrupts the organization of 
the organism on many levels: the relationship to reality and to himself, 
cognitive organization, normal speech patterns, for example; (c) Various 
factors, arising either from the noise input or the feedback deficit of auditory 
feedback, might operate simultaneously. And they might do so either con- 
vergently so that every observed effect is multiply caused, or hetero- 
geneously, so that different effects result from different features of the 
masking condition. In addition, some or all of the effects might be direct 
results of the factors mentioned or they might be indirect results of attempts 
to cope with the states engendered by the masking condition. Finally, differ- 
ent causes or mechanisms may be operative in different individuals. 

Some of these various conceivable attributes of the masking condition 
appear capable of explaining certain results. The input-arousal hypothesis, 
for example, could account for the greater loudness, affect expression, and 
verbal productivity. The simple deficit hypothesis can also account for these 
phenomena. Ideally, we should present at this point a detailed examination 
of the various interpretations, concluding with a statement of those alter- 
natives, which seem the most likely, and then await the results of experi- 
mental testing of these conclusions. Instead of engaging in that exercise, we 
will present a provisional “armchair” evaluation of the alternatives that 
seems quite plausible, leaving the final explication and testing of the various 
alternatives where they belong—to future controlled empirical analysis. 

Considerable experienced adaptation to the masking noise occurred over 
the course of the six exposures to it in the three interviews. Every subject 


PRESENT STUDY 369 


reported experiencing some adaptation. There was also a decrease in the 
frequency of negative references about the masking condition and in the 
number of subjects who stated it was more difficult to talk with the noise. 
Some subjects even developed a preference for the masking condition and 
some preferred it from the start. The observable phenomena, however, did 
not show a parallel change. They not only did not systematically decrease 
but actually became more prominent in some cases. These observations 
suggest that neither the novelty nor the stress caused the speech and other 
behavioral effects. Furthermore, a test of a stress-related hypothesis con- 
cerning increased productivity yielded negative results. 

When they spoke in the masking conditions, the subjects sounded like 
individuals who have become deaf after learning to talk. These people sound 
alike in at least two respects: in loudness and the flattening of intonation. 
The similarity is so striking that the speech of one would be mistaken for the 
other. This resemblance strongly suggests that the feedback deficit was a 
critical factor. But there is no reason why auditory feedback would function 
only in the regulation of loudness and intonation, It seems quite possible 
thaf the feedback deficit was also responsible for other linguistic changes: 
the prolongation, pitch changes, the vocal noises, slurring, rate changes, and 
exaggerated phrasing. These are all aspects of the speech skill that the 
delayed auditory feedback literature (Chase, Sutton, & First, 1959; Lee, 
1950a, b) has shown to be dependent on normal feedback. 

Viewed from the standpoint of either learning theory or psychoanalytic 
psychology, the remaining observable results—the cognitive confusions, the 
changes in voice quality, increased affect expression, freer associative re- 
sponse, and “thinking aloud””—could also be due to a deficit in auditory 
feedback. One must merely assume that the control of these aspects of 
behavior—one’s vocal style, the degree and quality of affect expression 
while talking, spontaneity and freedom in verbalizing, and cognitive organi- 
zation—is a negative feedback system. According to this assumption, any 
deviations in the auditory feedback from the vocal style, etc., which the 
individual characteristically “sets” for himself, constitute signals that acti- 
vate modulations and defenses that inhibit the audible deviations or the 
underlying processes instigating such deviations. In learning theory terms, 
such deviations in the auditory feedback would function as Hullian response- 
produced cues (rs) to which inhibitory responses and/or their drives were 
conditioned. In psychoanalytic theory, such deviations would function as 
self-produced stimuli perceived by the self-observing and self-evaluating 
superego functions that, in turn, instigate ego regulation and defense. 


Hypothesis: Disinhibition Due to Deficit in Auditory Feedback, The results 
and the reasoning in the preceding paragraphs suggest that disinhibition due 
to the deficit in auditory feedback was a central result of the masking-noise 





370 21. PEOPLE CAN’T HEAR THEIR OWN VOICES 


condition. Here we summarize the evidence and arguments in favor of this 
hypothesis. 

In the present study, the behavior during the masking-noise conditions 
included such phenomena as: 


. increased loudness, 

. less cultured speech, 

more ethnocentric speech, 

. increased laughter, 

. freer expression of positive and negative affect and thought, which 
sometimes concerned the interviewer, 

. increased amount of talking, reaching monologue proportions at times, 

. revelation of very intimate personal information, 

. the voicing of thoughts meant to be silent, 

. the misplacement of thought fragments in the flow of speech. 


VboOne 


COns 


This all adds up to “strange”? behavior under any conditions, especially on 
the part of college students interacting with a stranger and a “college pro- 
fessor” at that. The assumption of a process of disinhibition makes this 
strange behavior understandable. As we noted earlier, the results of other 
independent investigators are compatible with this assumption. 

Several considerations suggest that this disinhibition is due to the deficit 
in auditory feedback and not to the noise input per se: (a) The failure to 
actually adapt to the masking condition in spite of the subjective experience 
of adaptation to the noise and the similarity of the subjects’ speech to that of 
the deaf are highly suggestive in this regard, (b) The immediacy of the 
behavior change and the exquisite ‘‘off-on’’ nature of the relationship be- 
tween the changes and the use of the masking noise seem more compatible 
with the deficit hypothesis than with the input hypothesis, (c) As we have 
shown, it is theoretically conceivable that the disinhibition could be due to 
the deficit in auditory feedback, (d) Many of the subjects’ reports support 
this interpretation, for the relief from anxiety and self-criticism based on 
auditory feedback appeared to be the major reason why some subjects pre- 
ferred to talk under the masking condition. The fact that some subjects were 
distressed by the noise condition is not evidence against the feedback deficit 
hypothesis. Indeed, such a deficit appears to be frightening for some people 
for the very reason that it appears to be preferred by others, i.c., its disrup- 
tion of the customary drive-defense balance. In addition, the disturbance in 
customary modes of thought and speech and in the sense of self and reality 
could be indirect noxious results of a sensory deficit for some people. 

Thus the hypothesis provides a relatively coherent explanation of the 
observable phenomena and many of the major experiential reports by the 
subjects. This hypothesis is consistent with conclusions about the role of 





PRESENT STUDY 371 


sensory feedback derived from studies of distorted auditory feedback and 
the extensive studies of sensory deprivation. Indeed, our general behavioral 
categories are reminiscent of findings of the latter, as Klein (1965) has also 
pointed out. But the hypothesis clearly needs empirical testing. 

Stanton (1968) has made the first attempt at such empirical testing, in the 
study cited earlier. His design included a normal no-noise condition, and two 
noise conditions. In one of the latter, a white noise of complete feedback 
masking intensity was used; in the other, the noise was loud, but not quite 
loud enough to cause feedback masking. His subjects did not utter more 
tabooed words in the latter noise condition than in the no-noise condition. 
But they did utter more tabooed words in the masking-noise condition than 
in either the control-noise condition or the control no-noise condition. ‘Thus 
Stanton’s results not only demonstrated disinhibition when subjects could 
not hear themselves but also showed that a loud noise per se was not suffi- 
cient to produce this disinhibition. Further studies of this type are needed to 
determine which of the full range of behaviors observed by ourselves and 
others are functions of a feedback deficit and which might be caused by 
noise alone.> 


The Possibly General Role of Sensory Feedback in Ego 
Functioning 


It is now well known that sensory feedback from the organism plays a 
crucial role in complex skills (¢.g., Smith, 1962). Speech is simply a particu- 
larly interesting and important case in point. The results of this study, and 
those of Holmes and Holzman (1966), Klein (1965), and Stanton (1968) 
suggest that other ego functions may also be part of complex feedback 
systems. This discussion has emphasized the possibility that defense and 
other forms of inhibitory control are aspects of negative feedback systems. 
Klein’s (1965) discussion emphasizes the role of auditory feedback in sec- 
ondary process thinking. 

One naturally wonders, next, if the role of actual sensory feedback is not 
more extensive in ego functioning than is realized. There are two aspects to 
this question, both of which suggest problems for future research. On the 
one hand, one can ask: “What is the range of sensory feedback signals involved in 
any particular ego function?’ Consider the process of defense against aggres- 
sion, for example. Is it possible that proprioceptor feedback from the skeletal 
musculature, somesthetic feedback from the autonomic musculature (the 


SJanis (1959, p. 214) proposed that the use of a masking noise might result in “‘verbalization 
[that] would more closely approximate his own silent thoughts than if he could hear himself 
talk.” On this basis, Janis and Terwilliger (1962) obtained subjects’ associations to fearful 
‘communications under masking-noise conditions, but they did not test the underlying premise. 


372 = 21, PEOPLE CAN’T HEAR THEIR OWN VOICES 


“heat” of anger and the felt pounding of the heart, or their absence, for 
example), visual feedback from one’s overt actions, auditory feedback from 
one’s voice, as well as the feedback of one’s thoughts, all play a crucial role in 
defense against aggression? Is it possible that deviations in such feedback 
signals from those body sense perceptions one has come to tolerate are critical 
for the instigation and maintenance of defense? Is the essence of defense the 
minimization of such deviations from one’s customarily tolerable feedback 
sense perceptions? 

On the other hand, one can ask: “What is the range of ego functions in which 
sensory feedback from the organism's activities plays a significant role?’ Some of 
our observations suggest that other complex ego functions, in addition to the 
processes of defense and organized thought, may be just as dependent on 
feedback as are sensory-motor skills. Feeling “empty” when deprived of 
auditory feedback, as some subjects did, suggests that sensory feedback is 
critical for one’s basic sense of being. An inability to maintain the presentation 
of oneself as the realization of a fantasied cultivated actress with the loss of 
auditory feedback, as well as the return of less cultured and more ethno- 
centric speech in several young adults striving for upward social mobility and 
greater acculturation, suggest that one’s sense of identity may depend on orga- 
nismic feedback. Perhaps the basic mechanism of achieving identifications is a 
negative feedback process in which the individual is continually monitoring 
the most assorted sensory feedback from himself and is continually minimiz- 
ing all deviations in this feedback from that totality of sense impressions 
arising from his memories and fantasies of the person with whom he is 
identifying. By such a process one could transform himself into a replica of 
the identification object. This would merely be a complex, generalized form 
of the process by which a hearing child presumably comes to speak as his 
parents do. Se/f-observation, self-evaluation, and self-regulation appear to be 
major ego functions having general significance. But what is the basic mate- 
rial that is observed, and what is the observing, evaluating and regulating 
agent? The observations from studies of masked speech suggest that the 
basic material may consist, to a significant degree, of concrete sensory feed- 
back and that the observing, evaluating, and regulating agent is the brain and 
its sensory-perceptual and memory systems. Thus studies of the role of 
feedback in complex ego functions may bring together Freud’s conceptions 
of personality functioning set forth in chapter 7 of The Interpretation of Dreams 
(1900/1953a) and The Ego and the Id (1923/1961) and modern neurophysi- 
ology, to the mutual advantage of both approaches (see also chapter 4). 

These conceptions presume that extremely complex and intricate feed- 
back processes operate in behavior. This seems quite possible in view of the 
intricate, multifaceted feedback regulation of speech revealed by the studies 
that were reviewed in the introductory section of this chapter. 


PRESENT STUDY 373 
Miscellany 


A few other aspects and implications of the study merit some brief 
comments. 

1. The effect of the masking condition varied with the individual. Klein (1965) 
found the same to be true in his study. This implies that people vary in the 
degree to which auditory feedback is involved in the regulation of their 
behavior. This, in turn, raises a number of questions for further investiga- 
tion. Some of them are: What accounts for these individual differences? Are 
they a function of personality differences? Does this individual variation 
reflect variation in the degree of “internalization” of behavioral controls? 
What other cues or mechanisms can replace those involved in auditory 
feedback? 

2, We have noted but not discussed the fact that some aspects of language 
were essentially unaffected by the feedback deficit. This seems very interesting in 
view of the other marked effects it has. It would appear that some aspects of 
speech in adults are independent of auditory feedback. Some of them, such 
as the phonetic-articulatory, may be primarily dependent on kinesthetic 
feedback. Perhaps there are other aspects of language that can be, and have 
been—over and over again—performed silently and thus become indepen- 
dent of auditory feedback. All through life a person “thinks” in the gram- 
matical forms of his language. Such patterns can become independent of 
auditory feedback, in contrast to the strictly vocal, audible aspects of lan- 
guage. This consideration has two obvious research implications: one devel- 
opmental, the other cross cultural. Would the masking noise have greater 
effects on the “basic” linguistic patterns with younger subjects simply be- 
cause of the parallel decreased frequency of silent practice? Would feedback 
deficit have greater effects on the basic linguistic patterns as the different 
modes of silent practice varied? In children unable to read and write, for 
example, or in illiterate cultures? 

3. Subjects reported that when they talked the noise did not sound as 
loud as when they were not talking. This could be an “attention phe- 
nomenon.” The subjects’ reports supply striking confirmation of certain 
aspects of Freud’s (1900/1953a) theory of consciousness and attention as 
stated in chapter 7, The Interpretation of Dreams, and elsewhere. David 
Rapaport (1960, pp. 227ff) provided a brilliant, and the most useful, syn- 
thesis of this theory and a historical perspective of Freud’s thinking about it. 
Of the fourteen proposition (pp. 228-9) Rapaport derived from Freud’s 
formulations, the following are especially relevant here: 


1. The subjective conscious experience is determined by the distribution 
of a limited quantity of mental energy termed attention cathexis. 





374 21. PEOPLE CAN’T HEAR THEIR OWN VOICES 


3. Attention cathexis is part of the energy of the system Cs-Pes (in pre- 
sent-day terminology, the ego) which is termed hypercathexis. 

4, Excitations within the mental apparatus (internal) or on the receptor 
organs (external) attract attention cathexes proportionately to their intensity. 

5. Attention cathexis, if so attracted and if exceeding a certain amount 
(threshold), gives rise to the conscious experience of the excitation. 

6. Simultaneous or contiguous excitations compete for the limited quan- 
tity of attention cathexis. 

9. Defenses and other processes utilizing great amount of hypercathexes 
diminish the quantity of attention cathexis available. (Italics ours. In psycho- 
analytic theory, ‘other processes” would include the ego functions of speech 
and secondary process-thinking such as our subjects were engaging in during 
the interviews.) 


When our subjects talked in the masking conditions they were doing two 
things that, according to Freud’s theory of consciousness, should have caused 
the masking noise to decrease in loudness. First, they were producing recep- 
tor-excitation (proprioceptive) and “‘internal’’ excitation (ideational content 
of verbal images) that completed for attention cathexis with the auditory 
excitations from the masking noise (Propositions 1, 4, 5, 6). Secondly, when 
talking, the subjects were utilizing “ego energy,” hypercathexes (Proposi- 
tions 1, 3, 9). Not only is Freud’s theory of consciousness supported by the 
raw data, it thereby provides one explanation of them. 

Aside from the relation to psychoanalytic theory, the subjects’ reports of 
the basic phenomenon have some interesting implications. Generalizing 
from their reports, one would have this proposition: There is an inverse 
relationship between talking and the perceived intensity of receptor stimula- 
tion. Talking decreases it and not talking increases it. This implies some- 
thing potentially important about human interaction; that when a person 
talks he is less aware of cues emanating from his interlocutor than when he is 
silent. The generalization also implies something about intrapsychic func- 
tioning: that when a person talks he is less aware of stimulation arising from 
within himself than when he is silent. 

Common clinical observations and everyday experiences seem to be con- 
sistent with these ideas. The “‘compulsive talker,” for example, is inaccessi- 
ble to the ordinary external influences in interaction and appears clinically to 
talk in order to defend himself against inner sensations of anxiety. The latter 
is a form of resistance (defense) well known to the psychoanalyst. Under the 
guise of free associating par excellence, a patient may actually be successful- 
ly avoiding experiencing his anxieties. The increased self-awareness that 
often comes with silence may be due, in part, to the mechanism underlying 
the attention phenomenon. It is obvious that this phenomenon suggests a 





PRESENT STUDY 375. 


wide range of problems for further investigation. Is the interpretation of the 
subjects’ experiences as an attention phenomenon valid? What actually are 
the intrapsychic consequences of simply talking or not talking, disregarding 
content? What are the interactional consequences? Do nonverbal interac- 
tional cues, for example, have greater influence during brief pauses—when 
one is the listener, not the talker? 

4. The vocal noises generally occurred during pauses and at the onset of 
sentence and phrases. Although some of them were described as resembling 
“straining” sounds, it should be emphasized that the subjects did not show 
visible signs of making greater physical effort at these particular times. Nor 
were the sounds part of a speech-block pattern. The sounds, in their speech 
contexts, gave one the impression they resulted either from resting, aimless 
activity of the speech musculature during pauses, or from motor preparation 
for articulation at the onset of sentences or phrases. Obviously, slow mus- 
cular contractions or tensions in the speech apparatus produced them. Per- 
haps the same tensions are occurring regularly during normal speech, but at 
subaudible intensities. 

The findings of this study and their interpreation result in a view of the internal 
states of talking people that is somewhat different than one suspects from their normal 
overt speech. Subaudible muscular tensions in the speech apparatus; silent 
speech, often unrelated to or in conflict with audible utterances and even 
unknown to the speaker; reined dispositions to use seemingly discarded 
idiolects and dialects; stronger affects and impulses than are manifested 
overtly; promptings to talk at length and to say more personal things; per- 
petual self-observation and self-regulation; the continual maintenance of a 
sense of self and of reality—all these must form part of the current of 
behavior flowing silently beneath the rippling surface of the stream of overt 
speech. 


Summary 


Seventeen college students, male and female, participated in three indi- 
vidual personal interviews that were tape-recorded. During the interviews 
the subjects spoke under four different conditions: (a) in the usual face-to- 
face situation; (b) when they couldn’t hear their own voices because of a 
masking noise administered through earphones; (c) when they couldn't see 
the interviewer who was sitting behind them, but could hear themselves; 
and (d) when they could neither see the interviewer nor hear themselves 
talk. Each interview concluded with an open-ended inquiry. The tape re- 
cordings and typescripts were primarily studied by the systematic, clinical- 
observational method; some supplementary quantitative procedures were 
used. 





376 21. PEOPLE CAN’T HEAR THEIR OWN VOICES 


This report focuses on the effect of the masking-noise conditions. During 
these conditions, all subjects spoke in a much louder voice, with flattened 
intonation and, at times, lengthened syllables—all of which gave their 
speech a “‘sing-song” nature. All subjects also showed changes of various 
kinds in their “voice quality.” Some of these changes were of a general 
Phonological nature, such as increased nasality, whereas others included 
social class dialect shifts of a “regressive” nature. Changes in pitch, mean- 
ingless vocal noises, slurring of articulation, changes in rate of speaking, and 
more distinctive phrasing also occurred in the masking conditions, in de- 
creasing frequency. 

More general behavioral changes also occurred, including: increased af- 
fect expression; freer associative response, indicated quantitatively by in- 
creased verbal productivity and qualitatively by increased spontaneity and 
the communication of highly personal information; increased cognitive con- 
fusion; and “thinking aloud.” The latter, rare but striking, phenomenon 
consisted of the unintended and unconscious utterance of content often just 
above the audible threshold. In one instance, a subject disclaimed even the 
“thoughts” involved. 

All subjects reported experiencing considerable adaptation to the masking 
conditions, but there was no consistent external evidence of it. Control of 
loudness of the voice showed the greatest adaptation. 

Most subjects, especially in the first interview, preferred to talk without 
the noise. A small number, however, preferred talking with the noise and 
the nonnoise preference decreased as the interviews progressed. The major 
aversive qualities of the masking conditions included: the noise itself, the 
physical effort of talking in its presence, the voice feedback deficit, inter- 
ference with cognition and articulation, a sense of loss of contact with oneself 
and the interviewer, and negative affects. Many subjects reported that the 
noise sounded less intense when they talked than when they didn’t. The 
generally negative portrayal of the subjects’ experience contradicts the exter- 
nal impression of the subjects’ behavior during the interviews. No subject 
failed to complete the experiment and all appointments were kept. 

“Positive” reports about the masking condition included statements that 
the noise induced relaxation, a state akin to daydreaming, relief from reality, 
and self-criticism. 

It is important to note that the basic linguistic patterns persisted in the 
masking conditions: The subjects could still “speak English.” 

An evaluation of various mechanisms that might have mediated the ef- 
fects of the masking conditions leads to the conclusion that the feedback 
deficit was crucial and that normal auditory feedback plays an important role 
in the regulation of many voca/ dimensions of language behavior, in the 
control of the vocal expression of affects and thoughts, and in the mainte- 


ADDENDA 377 


nance of a sense of self and of reality. Various additional implications of the 
findings were discussed.® 


ADDENDA, (1986) 


We have examined the auditory feedback literature that has appeared since 
the preceding article was written, and we have reexamined some of the 
preexistent literature. Our search failed to locate any study replicating our 
use of white noise masking during extended, interactive, spontaneous dis- 
course. Yet there has been sustained investigation of exposure to loud noise 
and other related procedures. Generally these studies have involved brief 
episodes of speech and often the reading aloud of printed passages. Two 
notable exceptions are the studies of Holzman and Rousey (1971), in which 
subjects produced Thematic Apperception Test (TAT) stories, and Garber 
and Martin (1974), in which subjects uttered spontaneous monologues in 50- 
minute experimental sessions. Holzman and Rousey replicated the findings 
of their earlier study (1970), cited in our original article, that white noise 
masking caused an increase in impulse-related themes and a decrease in 
defensive themes on the TAT. Except Holzman and Rousey’s, the studies 
have been concerned with strictly vocalization processes. Four emphases 
emerge from this literature: the effect of exposure to masking noises on 
stuttering, the effect of such noise on voice loudness (the Lombard effect), 
the effect of amplifying auditory feedback (sidetone amplification) on voice 
loudness, and developmental aspects of auditory feedback. 


Noise Exposure and Stuttering. The early observations that exposure to 
loud masking noise decreased stuttering (Cherry & Sayers, 1956; Cherry, 
Sayers, & Marland, 1955; Kern, 1932; Shane, 1946, 1955) provided the 
major stimulus for our study. Subsequent research has provided substantial 
confirmation of those pioneering observations (Adams & Hutchinson, 1974; 
Adams & Moore, 1972; Burke, 1969; Conture, 1974; Conture & Brayton, 
1975; A. D. Dewar, 1984; A. Dewar, A. D. Dewar, & Barnes, 1976; Garber 
& Martin, 1974, 1977; Maraist & Hutton, 1957; May & Hackwood, 1968; 
Murray, 1969; Sutton & Chase, 1961; Webster & Dorman, 1970). Although 
Cherry and Sayers concluded that the low-frequency band “‘red”” noise (cut- 
off at 500 c/s) was much more effective than high frequencies in reducing 
stuttering, subsequent research indicates that high-frequency noise may be 
effective (Conture, 1974; May & Hackwood, 1968). Nearly all the studies 
cited, however, used a wide frequency band “white” noise. 


For a rigorous study and further discussion of the disinhibition hypothesis, the reader is 
referred to a later study by Holzman and Rousey (1970), published when these pages were 
already set in type. 


378 21. PEOPLE CAN’T HEAR THEIR OWN VOICES 


‘The nearly universal assumption among the researchers just cited is that 
the effect of the noise on stuttering is a direct consequence of interference 
with the auditory feedback of the speaker, which is assumed to be “hyper- 
salient” for the stutterer (Lane & Tranel, 1971) for whatever reasons. (We 
made the same assumption in our discussion of this phenomenon.) Several 
workers have challenged this assumption on various grounds. Sutton and 
Chase (1961) questioned it because they found that noise stimulation only 
during pauses in speech reduced stuttering just as much as noise stimulation 
during phonation or continuously during the testing session. Webster and 
Dorman (1970) replicated that finding. These findings do not necessarily 
negate the role of interference with auditory feedback. Noise stimulation 
during pauses could have produced temporary increased threshold shifts that 
interfered with auditory feedback. In our own study, we occasionally noticed 
shifts in speech upon turning off the noise that could be accounted for by 
recovery from such threshold shifts (see p. 358). Noise stimulation during 
pauses might also have distracted the speaker's attention from his own au- 
ditory feedback. Barr and Carmel (1969) found that high-frequency noise of 
moderate intensity reduced stuttering, even though the noise properties 
were such that it did not in itself completely mask auditory feedback. They 
tested each subject twice: The reduction in stuttering was much greater in 
the first test. Such results might be due to the distracting effect of the noise, 
for it might well have been greater in the first test, when its novelty would be 
greater. Thus, the results of these three studies do not rule out the pos- 
sibility that the masking of auditory feedback is the principal factor mediat- 
ing reduced stuttering upon noise stimulation. 


Noise Exposure and Vocal Intensity. Speakers automatically talk louder 
when stimulated with a loud noise—the Lombard effect (Lombard, 1910). 
This effect has been repeatedly confirmed in normal speakers in the literature 
we searched (Dreher & O'Neill, 1958; Gardner, 1964, 1966; Korn, 1954, 
Kryter, 1946; Pickett, 1958; Siegel & Kennard, 1984; Siegel, Pick, Olsen & 
Sawin, 1976; Siegel, Schork, Pick, & Garber, 1982; Webster & Klumpp, 
1962) and in stutterers as well (Adams & Hutchinson, 1974; Adams & Moore, 
1972; Conture, 1974). Wingate (1970) based another alternative to the feed- 
back-masking hypothesis upon this Lombard effect. He proposed that the 
reduction in stuttering under noise stimulation might be mediated simply by 
the increased vocal intensity of the speakers rather than by the reduction in 
auditory feedback. He did not test this hypothesis, nor have others who 
adopted it merely on the basis of the co-occurrence of increased loudness and 
reduced stuttering as general effects of noise stimulation (Adams & Hutchin- 
son, 1974; Adams & Moore, 1972). Conture (1974) did determine the cor- 
relation between such changes in vocal intensity and stuttering: the two were 
highly correlated, .98, when one anomalous subject was omitted from the 





ADDENDA 379 


analysis. These group and individual associations between increased vocal 
intensity and decreased stuttering do not prove that the former caused the 
latter. Auditory feedback masking could have caused each of them. 

Garber and Martin (1977) compared the frequency of stuttering when 
subjects spoke with a normal level and an increased level of loudness, both 
with and without noise stimulation. Speaking in a loud voice did not de- 
crease stuttering in either quiet or noise, but all subjects reduced their 
stuttering in noise compared with the quiet condition. 

Thus, the hypothesis that increased vocal intensity mediates the decrease 
in stuttering upon noise stimulation has not yet been confirmed. That hy- 
pothesis does raise a comparable question about our findings. Could the 
increased loudness with which our subjects spoke under the masking noise 
conditions have mediated any of the other concomitant changes? That seems 
unlikely because adaptation in loudness, when it did occur, was not always 
related to adaptation in other phenomena. Future research, however, should 
address that question. 


Sidetone Amplification. In the introduction to our study we noted that two 
investigations (Black, 1950a; Lightfoot & Morrill, 1949) had found that in- 
creasing the intensity of a speaker's auditory feedback (sidetone amplifica- 
tion) causes the person to lower the intensity of his voice. Our subsequent 
search of the literature has shown that this phenomenon has been repeatedly 
demonstrated (Lane, Catania, & Stevens, 1961; Siegel & Kennard, 1984; 
Siegel, Pick, Olsen, & Sawin, 1976; Siegel, Schork, Pick, & Garber, 1982). 


Intraindividual Feedback or Social Communicative Loops. As was the case 
in accounting for the effect of loud noise exposure on stuttering, investiga- 
tors have been nearly unanimous in attributing, explicitly or implicitly, the 
Lombard effect and the results of sidetone amplication to changes in audito- 
ty feedback. Lane and Tranel (1971) provided a notable exception. In their 
tightly reasoned, trenchant paper, they concluded that both of these effects 
resulted from automatic attempts of the speaker to maintain a voice loudness 
level that is favorable for intelligible communication with the participant. 
Lane and Tranel argue, for example, that upon noise stimulation the speaker 
assumes that the hearing of the audience is impaired as his is and speaks 
more loudly to improve the ability of the listener to understand what he is 
saying. Analogously, they argue, a person speaks more softly when the ex- 
perimenter amplifies this sidetone in order to obviate the assumed altered 
communication caused by his assumed loud voice. Thus Lane and Tranel 
shift the explanation from modifications in auditory feedback to changes that 
the speaker assumes have occurred in the reception and comprehension of 
his speech by the listener. They have replaced the emphasis on an intrain- 





380 21. PEOPLE CAN’T HEAR THEIR OWN VOICES 


dividual feedback loop with emphasis on the interpersonal communicative 
loop. 

Investigators of auditory feedback have neglected the proposal of Lane 
and Tranel. We found only one study that attempted an explicit test of it. 
Siegel et al. (1982) had speakers talk in various feedback conditions and they 
determined the intercorrelations of speech changes caused by those condi- 
tions. Of particular relevance here was the finding that there was no correla- 
tion between the voice intensity changes caused by the Lombard procedure 
and those caused by sidetone amplification. The reasoning and conclusion of 
Lane and Tranel require a significant correlation between the two types of 
changes. 

This single negative result does not destroy Lane and Tranel’s hypoth- 
esis. Perhaps the changes in the social communicative loop will be shown to 
contribute to the Lombard and sidetone amplification effects. But it seems 
highly unlikely that those inferred changes are the only ones that occur upon 
auditory feedback masking. Most of the changes observed in our subjects 
seem unrelated to attempts to maintain intelligible communication, or seem 
even contrary to such an aim. They seem related to disinhibition (see p. 
369). Others have reached similar conclusions (Holmes & Holzman, 1966; 
Holzman & Rousey, 1970, 1971; Klein, 1965; Stanton, 1968). Perhaps the 
intraindividual feedback and the social communicative loops interact in ways 
yet to be determined. 


Developmental Patterns, Near the end of the discussion in our chapter, we 
raised the issue of possible developmental patterns in the role of auditory 
feedback and of possibly differential roles for different aspects of speech. 
Our subsequent examination of the literature shows that these developmen- 
tal issues have received some attention, with as yet inconclusive results. 

The relevant developmental studies have been almost exclusively con- 
cerned with the effect of delayed auditory feedback (DAF) on temporal aspects 
of speech. There are some 10 such studies that have yielded contradictory 
findings, with about half showing a decreasing effect of DAF with increasing 
age and the other half showing the opposite effect. Differences in ages of the 
subjects and in procedure make it difficult to draw any general conclusions 
from these studies. The reader is referred to a recent paper by Siegel, Fehst, 
Garber, and Pick (1980) for a discussion of those studies, as well as the report 
of their own research into the issue. 

‘A few studies have been concerned with the developmental pattern of the 
Lombard and sidetone amplification procedures. Siegel et al. (1976) found 
that college age subjects were affected more than preschool children by 
sidetone amplification. Both age-groups were equally susceptible to the 
Lombard effect. The latter result is compatible with the finding of Crary, 
Fucci, and Bond (1981) that 6-9 year-old children and 19-25 year-old adults 








ADDENDA 381 


were equally likely to prolong vowels with auditory masking. Such findings, 
that different types of feedback changes seem to show different develop- 
mental patterns, led Siegel and his co-workers (Siegel et al., 1980; Siegel et 
al., 1976) to suggest that there may well be multiple feedback loops involved 
in speech that might follow different developmental courses. 


